Máster en Data Science - Machine Learning¶

Tratamiento de Valores missing, outlier y correlaciones¶

Autor: Ramón Morillo Barrera

Dataset: Application data¶

En este notebook trabajaremos en el análisis exploratorio gráfico con el objetivo de visualizar y entender el comportamiento de las variables. Trabajaremos en el tratamiento de valores nulos o missing, outliers y estudiaremos la correlacion entre variables.

Como comentamos anteriormente, se llevará a cabo una separación estratificada en el paso de train-test split debido al desbalanceo de la variable objetivo.

Librerías¶

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.impute import KNNImputer
from termcolor import colored, cprint
import scipy.stats as ss
import warnings
import sys
from scipy.stats import chi2_contingency
from sklearn.model_selection import train_test_split

warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

Funciones¶

In [2]:
sys.path.append('../src')
import funciones_auxiliares as f_aux
sys.path.remove('../src')

# Constante
seed = 12354

Importo el dataset¶

In [3]:
df_loan = pd.read_csv('../../data_loan_status/interim/data_preprocessing/pd_data_initial_preprocessing.csv')
df_loan.head()
Out[3]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE FONDKAPREMONT_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_MODE FLOORSMIN_AVG FLOORSMIN_MEDI YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_MEDI ELEVATORS_AVG ELEVATORS_MODE WALLSMATERIAL_MODE APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_MEDI ENTRANCES_MODE ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI HOUSETYPE_MODE FLOORSMAX_MODE FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE EXT_SOURCE_3 AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_YEAR AMT_REQ_CREDIT_BUREAU_QRT NAME_TYPE_SUITE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY CNT_FAM_MEMBERS DAYS_LAST_PHONE_CHANGE HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION ORGANIZATION_TYPE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER AMT_CREDIT AMT_INCOME_TOTAL CNT_CHILDREN NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE NAME_EDUCATION_TYPE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE TARGET FLAG_OWN_REALTY LIVE_REGION_NOT_WORK_REGION FLAG_EMAIL REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START FLAG_PHONE REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY REG_REGION_NOT_WORK_REGION FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_11 FLAG_DOCUMENT_10 FLAG_DOCUMENT_9 FLAG_DOCUMENT_8 FLAG_DOCUMENT_7 FLAG_DOCUMENT_6 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_19 FLAG_DOCUMENT_18 FLAG_DOCUMENT_17 FLAG_DOCUMENT_16 FLAG_DOCUMENT_15 FLAG_DOCUMENT_14 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21
0 100002 0.0143 0.0144 0.0144 0.0000 0.0000 0.0 reg oper account 0.0205 0.0202 0.022 0.1250 0.1250 0.1250 0.6341 0.6243 0.6192 NaN 0.0375 0.0369 0.0377 0.0383 0.0369 0.0369 0.083037 0.0000 0.0 0.00 0.00 0.00 0.0000 Stone, brick 0.0247 0.0252 0.0250 0.0690 0.0690 0.0690 0.0190 0.0198 0.0193 block of flats 0.0833 0.0833 0.0833 0.9722 0.9722 0.9722 0.0149 No Laborers 0.139376 0.0 0.0 0.0 0.0 1.0 0.0 Unaccompanied 2.0 2.0 2.0 2.0 0.262949 351000.0 24700.5 1.0 -1134.0 10 0 Business Entity Type 3 Cash loans N M 406597.5 202500.0 0 Working Single / not married House / apartment 0.018801 Secondary / secondary special -9461 -637 -3648.0 -2120 1 1 0 1 1 Y 0 0 2 2 WEDNESDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 100003 0.0605 0.0608 0.0497 0.0039 0.0039 0.0 reg oper account 0.0787 0.0773 0.079 0.3333 0.3333 0.3333 0.8040 0.7987 0.7960 NaN 0.0132 0.0130 0.0128 0.0538 0.0529 0.0529 0.311267 0.0098 0.0 0.01 0.08 0.08 0.0806 Block 0.0959 0.0924 0.0968 0.0345 0.0345 0.0345 0.0549 0.0554 0.0558 block of flats 0.2917 0.2917 0.2917 0.9851 0.9851 0.9851 0.0714 No Core staff NaN 0.0 0.0 0.0 0.0 0.0 0.0 Family 1.0 0.0 1.0 0.0 0.622246 1129500.0 35698.5 2.0 -828.0 11 0 School Cash loans N F 1293502.5 270000.0 0 State servant Married House / apartment 0.003541 Higher education -16765 -1188 -1186.0 -291 1 1 0 1 0 N 0 0 1 1 MONDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 100004 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 26.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Laborers 0.729567 0.0 0.0 0.0 0.0 0.0 0.0 Unaccompanied 0.0 0.0 0.0 0.0 0.555912 135000.0 6750.0 1.0 -815.0 9 0 Government Revolving loans Y M 135000.0 67500.0 0 Working Single / not married House / apartment 0.010032 Secondary / secondary special -19046 -225 -4260.0 -2531 1 1 1 1 0 Y 0 0 2 2 MONDAY 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 100006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Laborers NaN NaN NaN NaN NaN NaN NaN Unaccompanied 2.0 0.0 2.0 0.0 0.650442 297000.0 29686.5 2.0 -617.0 17 0 Business Entity Type 3 Cash loans N F 312682.5 135000.0 0 Working Civil marriage House / apartment 0.008019 Secondary / secondary special -19005 -3039 -9833.0 -2437 1 1 0 1 0 Y 0 0 2 2 WEDNESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 100007 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Core staff NaN 0.0 0.0 0.0 0.0 0.0 0.0 Unaccompanied 0.0 0.0 0.0 0.0 0.322738 513000.0 21865.5 1.0 -1106.0 11 0 Religion Cash loans N M 513000.0 121500.0 0 Working Single / not married House / apartment 0.028663 Secondary / secondary special -19932 -3038 -4311.0 -3458 1 1 0 1 0 Y 0 0 2 2 THURSDAY 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
In [4]:
df_loan.columns
Out[4]:
Index(['SK_ID_CURR', 'COMMONAREA_AVG', 'COMMONAREA_MEDI', 'COMMONAREA_MODE',
       'NONLIVINGAPARTMENTS_AVG', 'NONLIVINGAPARTMENTS_MEDI',
       'NONLIVINGAPARTMENTS_MODE', 'FONDKAPREMONT_MODE',
       'LIVINGAPARTMENTS_MEDI', 'LIVINGAPARTMENTS_AVG',
       ...
       'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_19',
       'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_17', 'FLAG_DOCUMENT_16',
       'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_20',
       'FLAG_DOCUMENT_21'],
      dtype='object', length=122)

Cambio de tipo de variables categóricas¶

Cambio el tipo de las variables object a category

In [4]:
list_var_cat, other = f_aux.dame_variables_categoricas(dataset=df_loan)
df_loan[list_var_cat] = df_loan[list_var_cat].astype("category")
list_var_continuous = list(df_loan.select_dtypes('float').columns)
df_loan[list_var_continuous] = df_loan[list_var_continuous].astype(float)
df_loan.dtypes
Out[4]:
SK_ID_CURR                         int64
COMMONAREA_AVG                   float64
COMMONAREA_MEDI                  float64
COMMONAREA_MODE                  float64
NONLIVINGAPARTMENTS_AVG          float64
NONLIVINGAPARTMENTS_MEDI         float64
NONLIVINGAPARTMENTS_MODE         float64
FONDKAPREMONT_MODE              category
LIVINGAPARTMENTS_MEDI            float64
LIVINGAPARTMENTS_AVG             float64
LIVINGAPARTMENTS_MODE            float64
FLOORSMIN_MODE                   float64
FLOORSMIN_AVG                    float64
FLOORSMIN_MEDI                   float64
YEARS_BUILD_MODE                 float64
YEARS_BUILD_MEDI                 float64
YEARS_BUILD_AVG                  float64
OWN_CAR_AGE                      float64
LANDAREA_MEDI                    float64
LANDAREA_AVG                     float64
LANDAREA_MODE                    float64
BASEMENTAREA_MODE                float64
BASEMENTAREA_AVG                 float64
BASEMENTAREA_MEDI                float64
EXT_SOURCE_1                     float64
NONLIVINGAREA_AVG                float64
NONLIVINGAREA_MODE               float64
NONLIVINGAREA_MEDI               float64
ELEVATORS_MEDI                   float64
ELEVATORS_AVG                    float64
ELEVATORS_MODE                   float64
WALLSMATERIAL_MODE              category
APARTMENTS_AVG                   float64
APARTMENTS_MODE                  float64
APARTMENTS_MEDI                  float64
ENTRANCES_MEDI                   float64
ENTRANCES_MODE                   float64
ENTRANCES_AVG                    float64
LIVINGAREA_AVG                   float64
LIVINGAREA_MODE                  float64
LIVINGAREA_MEDI                  float64
HOUSETYPE_MODE                  category
FLOORSMAX_MODE                   float64
FLOORSMAX_AVG                    float64
FLOORSMAX_MEDI                   float64
YEARS_BEGINEXPLUATATION_MODE     float64
YEARS_BEGINEXPLUATATION_AVG      float64
YEARS_BEGINEXPLUATATION_MEDI     float64
TOTALAREA_MODE                   float64
EMERGENCYSTATE_MODE             category
OCCUPATION_TYPE                 category
EXT_SOURCE_3                     float64
AMT_REQ_CREDIT_BUREAU_WEEK       float64
AMT_REQ_CREDIT_BUREAU_MON        float64
AMT_REQ_CREDIT_BUREAU_HOUR       float64
AMT_REQ_CREDIT_BUREAU_DAY        float64
AMT_REQ_CREDIT_BUREAU_YEAR       float64
AMT_REQ_CREDIT_BUREAU_QRT        float64
NAME_TYPE_SUITE                 category
OBS_60_CNT_SOCIAL_CIRCLE         float64
DEF_60_CNT_SOCIAL_CIRCLE         float64
OBS_30_CNT_SOCIAL_CIRCLE         float64
DEF_30_CNT_SOCIAL_CIRCLE         float64
EXT_SOURCE_2                     float64
AMT_GOODS_PRICE                  float64
AMT_ANNUITY                      float64
CNT_FAM_MEMBERS                  float64
DAYS_LAST_PHONE_CHANGE           float64
HOUR_APPR_PROCESS_START         category
REG_REGION_NOT_LIVE_REGION      category
ORGANIZATION_TYPE               category
NAME_CONTRACT_TYPE              category
FLAG_OWN_CAR                    category
CODE_GENDER                     category
AMT_CREDIT                       float64
AMT_INCOME_TOTAL                 float64
CNT_CHILDREN                    category
NAME_INCOME_TYPE                category
NAME_FAMILY_STATUS              category
NAME_HOUSING_TYPE               category
REGION_POPULATION_RELATIVE       float64
NAME_EDUCATION_TYPE             category
DAYS_BIRTH                         int64
DAYS_EMPLOYED                      int64
DAYS_REGISTRATION                float64
DAYS_ID_PUBLISH                    int64
FLAG_MOBIL                      category
FLAG_EMP_PHONE                  category
FLAG_WORK_PHONE                 category
FLAG_CONT_MOBILE                category
TARGET                          category
FLAG_OWN_REALTY                 category
LIVE_REGION_NOT_WORK_REGION     category
FLAG_EMAIL                      category
REGION_RATING_CLIENT            category
REGION_RATING_CLIENT_W_CITY     category
WEEKDAY_APPR_PROCESS_START      category
FLAG_PHONE                      category
REG_CITY_NOT_LIVE_CITY          category
REG_CITY_NOT_WORK_CITY          category
LIVE_CITY_NOT_WORK_CITY         category
REG_REGION_NOT_WORK_REGION      category
FLAG_DOCUMENT_4                 category
FLAG_DOCUMENT_5                 category
FLAG_DOCUMENT_2                 category
FLAG_DOCUMENT_3                 category
FLAG_DOCUMENT_11                category
FLAG_DOCUMENT_10                category
FLAG_DOCUMENT_9                 category
FLAG_DOCUMENT_8                 category
FLAG_DOCUMENT_7                 category
FLAG_DOCUMENT_6                 category
FLAG_DOCUMENT_12                category
FLAG_DOCUMENT_13                category
FLAG_DOCUMENT_19                category
FLAG_DOCUMENT_18                category
FLAG_DOCUMENT_17                category
FLAG_DOCUMENT_16                category
FLAG_DOCUMENT_15                category
FLAG_DOCUMENT_14                category
FLAG_DOCUMENT_20                category
FLAG_DOCUMENT_21                category
dtype: object

Separación Train-Test estratificada¶

Separaré el dataset en train y test manteniendo la proporción de la variable objetivo. Pero antes, voy a graficar la proporción de dicha variable.

In [9]:
target_count = df_loan.groupby('TARGET').agg({'TARGET':'count'}).reset_index(drop=True)
target_count['value'] = list(target_count.index)
target_count
Out[9]:
TARGET value
0 282686 0
1 24825 1
In [10]:
df_plot_loan_status = df_loan['TARGET']\
        .value_counts(normalize=True)\
        .mul(100).rename('percent').reset_index()

df_plot_loan_status_conteo = df_loan['TARGET'].value_counts(normalize=True).reset_index()
df_plot_loan_status_conteo
Out[10]:
TARGET proportion
0 0 0.919271
1 1 0.080729
In [8]:
sns.set_theme(style="whitegrid")

fig, ax = plt.subplots(figsize=(10, 6))  # Aumenta el tamaño de la gráfica

# Grafico de barras
sns.barplot(
    data=target_count, 
    x='value', 
    y='TARGET', 
    ax=ax, 
    hue='value', 
    dodge=False,  # Evita separación entre barras
    palette="pastel",  
    edgecolor="0.2"    # Añade bordes a las barras
)

# Título y etiquetas de ejes 
ax.set_title('Conteo de valores de la variable TARGET', fontsize=18, fontweight='bold', color='darkblue')
ax.set_ylabel('Count', fontsize=14, color='darkgrey')
ax.set_xlabel('Value', fontsize=14, color='darkgrey')

# Añade las etiquetas de conteo encima de las barras
for container in ax.containers:
    ax.bar_label(container, fmt='{:,.0f}', label_type="edge", padding=3, fontsize=12, color="black")
No description has been provided for this image
In [9]:
sns.set_theme(style="whitegrid")

fig, ax = plt.subplots(figsize=(10, 6))  # Aumenta el tamaño de la gráfica

# Grafico de barras
sns.barplot(
    data=df_plot_loan_status_conteo, 
    x='TARGET', 
    y='proportion', 
    ax=ax, 
    hue='TARGET', 
    dodge=False,  # Evita separación entre barras
    palette="pastel",  
    edgecolor="0.2"    # Añade bordes a las barras
)

# Título y etiquetas de ejes 
ax.set_title('Conteo de valores de la variable TARGET', fontsize=18, fontweight='bold', color='darkblue')
ax.set_ylabel('Count', fontsize=14, color='darkgrey')
ax.set_xlabel('Value', fontsize=14, color='darkgrey')

# Añade las etiquetas de conteo encima de las barras
for container in ax.containers:
    ax.bar_label(container, fmt='{:,.2%}', label_type="edge", padding=3, fontsize=12, color="black")
No description has been provided for this image

Calculé y grafiqué los valores de la variable Target para combrobar que al realizar la separación en train y test las proporciones se mantengan gracias a la estratificación. Ya que nuestra variable objetivo como comentamos en el anterior notebook, está claramente desbalanceada.

In [12]:
X_df_loan, X_df_loan_test, y_df_loan, y_df_loan_test = train_test_split(df_loan.drop('TARGET',axis=1), 
                                                                     df_loan['TARGET'], 
                                                                     stratify=df_loan['TARGET'], 
                                                                     test_size=0.2)
df_loan_train = pd.concat([X_df_loan, y_df_loan],axis=1)
df_loan_test = pd.concat([X_df_loan_test, y_df_loan_test],axis=1)
In [11]:
print(f'''
\033[1mTRAIN\033[0m:
{y_df_loan.value_counts(normalize=True)}

\033[1mTEST\033[0m:
{y_df_loan_test.value_counts(normalize=True)}''')
TRAIN:
TARGET
0    0.919271
1    0.080729
Name: proportion, dtype: float64

TEST:
TARGET
0    0.919272
1    0.080728
Name: proportion, dtype: float64

La separación estratificada se realizó correctamente. Observamos la misma proporción de la variable TARGET tanto en train como en test.

Visualización descriptiva de los datos¶

Vamos a observar la proporción de valores nulos en columnas y filas, además de una visualización descriptiva de la relación de las demás variables con la variable TARGET

In [13]:
pd_series_null_columns = df_loan_train.isnull().sum().sort_values(ascending=False)
pd_series_null_rows = df_loan_train.isnull().sum(axis=1).sort_values(ascending=False)
print(pd_series_null_columns.shape, pd_series_null_rows.shape)

pd_null_columnas = pd.DataFrame(pd_series_null_columns, columns=['nulos_columnas'])     
pd_null_filas = pd.DataFrame(pd_series_null_rows, columns=['nulos_filas'])  
pd_null_filas['TARGET'] = df_loan['TARGET'].copy()
pd_null_columnas['porcentaje_columnas'] = pd_null_columnas['nulos_columnas']/df_loan_train.shape[0]
pd_null_filas['porcentaje_filas']= pd_null_filas['nulos_filas']/df_loan_train.shape[1]
(122,) (246008,)
In [13]:
pd_null_columnas
Out[13]:
nulos_columnas porcentaje_columnas
COMMONAREA_AVG 171986 0.699107
COMMONAREA_MEDI 171986 0.699107
COMMONAREA_MODE 171986 0.699107
NONLIVINGAPARTMENTS_MODE 170889 0.694648
NONLIVINGAPARTMENTS_AVG 170889 0.694648
NONLIVINGAPARTMENTS_MEDI 170889 0.694648
FONDKAPREMONT_MODE 168323 0.684218
LIVINGAPARTMENTS_MEDI 168253 0.683933
LIVINGAPARTMENTS_AVG 168253 0.683933
LIVINGAPARTMENTS_MODE 168253 0.683933
FLOORSMIN_AVG 166988 0.678791
FLOORSMIN_MODE 166988 0.678791
FLOORSMIN_MEDI 166988 0.678791
YEARS_BUILD_MODE 163643 0.665194
YEARS_BUILD_AVG 163643 0.665194
YEARS_BUILD_MEDI 163643 0.665194
OWN_CAR_AGE 162412 0.660190
LANDAREA_MEDI 146133 0.594017
LANDAREA_MODE 146133 0.594017
LANDAREA_AVG 146133 0.594017
BASEMENTAREA_AVG 144042 0.585518
BASEMENTAREA_MODE 144042 0.585518
BASEMENTAREA_MEDI 144042 0.585518
EXT_SOURCE_1 138763 0.564059
NONLIVINGAREA_AVG 135798 0.552006
NONLIVINGAREA_MODE 135798 0.552006
NONLIVINGAREA_MEDI 135798 0.552006
ELEVATORS_MODE 131122 0.532999
ELEVATORS_MEDI 131122 0.532999
ELEVATORS_AVG 131122 0.532999
WALLSMATERIAL_MODE 125217 0.508996
APARTMENTS_MEDI 124936 0.507853
APARTMENTS_AVG 124936 0.507853
APARTMENTS_MODE 124936 0.507853
ENTRANCES_MODE 123980 0.503967
ENTRANCES_MEDI 123980 0.503967
ENTRANCES_AVG 123980 0.503967
LIVINGAREA_AVG 123562 0.502268
LIVINGAREA_MEDI 123562 0.502268
LIVINGAREA_MODE 123562 0.502268
HOUSETYPE_MODE 123521 0.502102
FLOORSMAX_MODE 122520 0.498033
FLOORSMAX_MEDI 122520 0.498033
FLOORSMAX_AVG 122520 0.498033
YEARS_BEGINEXPLUATATION_AVG 120062 0.488041
YEARS_BEGINEXPLUATATION_MODE 120062 0.488041
YEARS_BEGINEXPLUATATION_MEDI 120062 0.488041
TOTALAREA_MODE 118803 0.482923
EMERGENCYSTATE_MODE 116665 0.474233
OCCUPATION_TYPE 77178 0.313722
EXT_SOURCE_3 48754 0.198181
AMT_REQ_CREDIT_BUREAU_HOUR 33250 0.135158
AMT_REQ_CREDIT_BUREAU_WEEK 33250 0.135158
AMT_REQ_CREDIT_BUREAU_MON 33250 0.135158
AMT_REQ_CREDIT_BUREAU_YEAR 33250 0.135158
AMT_REQ_CREDIT_BUREAU_DAY 33250 0.135158
AMT_REQ_CREDIT_BUREAU_QRT 33250 0.135158
NAME_TYPE_SUITE 1049 0.004264
DEF_30_CNT_SOCIAL_CIRCLE 836 0.003398
OBS_60_CNT_SOCIAL_CIRCLE 836 0.003398
DEF_60_CNT_SOCIAL_CIRCLE 836 0.003398
OBS_30_CNT_SOCIAL_CIRCLE 836 0.003398
EXT_SOURCE_2 523 0.002126
AMT_GOODS_PRICE 227 0.000923
AMT_ANNUITY 6 0.000024
DAYS_LAST_PHONE_CHANGE 1 0.000004
SK_ID_CURR 0 0.000000
CNT_FAM_MEMBERS 0 0.000000
HOUR_APPR_PROCESS_START 0 0.000000
REG_REGION_NOT_LIVE_REGION 0 0.000000
ORGANIZATION_TYPE 0 0.000000
NAME_CONTRACT_TYPE 0 0.000000
FLAG_OWN_CAR 0 0.000000
CODE_GENDER 0 0.000000
AMT_CREDIT 0 0.000000
AMT_INCOME_TOTAL 0 0.000000
CNT_CHILDREN 0 0.000000
NAME_INCOME_TYPE 0 0.000000
NAME_FAMILY_STATUS 0 0.000000
NAME_HOUSING_TYPE 0 0.000000
REGION_POPULATION_RELATIVE 0 0.000000
NAME_EDUCATION_TYPE 0 0.000000
DAYS_BIRTH 0 0.000000
DAYS_EMPLOYED 0 0.000000
DAYS_REGISTRATION 0 0.000000
DAYS_ID_PUBLISH 0 0.000000
FLAG_MOBIL 0 0.000000
FLAG_EMP_PHONE 0 0.000000
FLAG_WORK_PHONE 0 0.000000
FLAG_CONT_MOBILE 0 0.000000
FLAG_OWN_REALTY 0 0.000000
LIVE_REGION_NOT_WORK_REGION 0 0.000000
FLAG_EMAIL 0 0.000000
REGION_RATING_CLIENT 0 0.000000
REGION_RATING_CLIENT_W_CITY 0 0.000000
WEEKDAY_APPR_PROCESS_START 0 0.000000
FLAG_PHONE 0 0.000000
REG_CITY_NOT_LIVE_CITY 0 0.000000
REG_CITY_NOT_WORK_CITY 0 0.000000
LIVE_CITY_NOT_WORK_CITY 0 0.000000
REG_REGION_NOT_WORK_REGION 0 0.000000
FLAG_DOCUMENT_4 0 0.000000
FLAG_DOCUMENT_5 0 0.000000
FLAG_DOCUMENT_2 0 0.000000
FLAG_DOCUMENT_3 0 0.000000
FLAG_DOCUMENT_11 0 0.000000
FLAG_DOCUMENT_10 0 0.000000
FLAG_DOCUMENT_9 0 0.000000
FLAG_DOCUMENT_8 0 0.000000
FLAG_DOCUMENT_7 0 0.000000
FLAG_DOCUMENT_6 0 0.000000
FLAG_DOCUMENT_12 0 0.000000
FLAG_DOCUMENT_13 0 0.000000
FLAG_DOCUMENT_19 0 0.000000
FLAG_DOCUMENT_18 0 0.000000
FLAG_DOCUMENT_17 0 0.000000
FLAG_DOCUMENT_16 0 0.000000
FLAG_DOCUMENT_15 0 0.000000
FLAG_DOCUMENT_14 0 0.000000
FLAG_DOCUMENT_20 0 0.000000
FLAG_DOCUMENT_21 0 0.000000
TARGET 0 0.000000
In [14]:
pd_null_filas
Out[14]:
nulos_filas TARGET porcentaje_filas
150206 61 0 0.5
197736 61 0 0.5
269492 61 0 0.5
185713 61 0 0.5
269786 61 0 0.5
... ... ... ...
74499 0 0 0.0
190162 0 0 0.0
152091 0 0 0.0
158164 0 0 0.0
151687 0 0 0.0

246008 rows × 3 columns

Vamos a visualizar la distribución de las variables numéricas y categóricas con la variable TARGET

Genero listas por tipos de variables para visualizarlas a continuación. Con una función propia que he programado para realizar este estudio.

In [24]:
df_loan_bool, df_loan_cat, df_loan_num = f_aux.lista_valores(df_loan)

He programado una función propia para poder visualizar las distribuciones de las variables por separado y sus distribuciones con la variable objetivo. Esta función detecta las variables numéricas y categóricas por separado para elegir el modo en el que se grafican.

La función se encuentra alojada en la carpeta src en el fichero 'funciones_auxiliares.py'

In [16]:
warnings.filterwarnings('ignore')
for i in list(df_loan_train.columns):
    if i in df_loan_num:
        f_aux.double_plot(df_loan_train, col_name=i, is_cont=True, target='TARGET')
    elif  ((i in df_loan_bool) | (i in df_loan_cat)) & (i!='TARGET'):
        f_aux.double_plot(df_loan_train, col_name=i, is_cont=False, target='TARGET')
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [17]:
df_loan_train['ORGANIZATION_TYPE'] = df_loan_train['ORGANIZATION_TYPE'].astype('category')

f_aux.double_plot(df_loan_train, col_name='ORGANIZATION_TYPE', is_cont=False, target='TARGET')
No description has been provided for this image

Análisis del gráfico¶

Cuando observamos las variables representadas visualmente, se dejan ver algunos detalles a tener en cuenta. Como el desbalanceo de la variable objetivo que ya había mencionado con anterioridad, o la cantidad de valores nulos de algunas variables que posteriormente transformaremos. Vamos a comentar el comportamiento de algunas variables en relación a nuestra variable objetivo TARGET.

  1. Los clientes que tienen coches más antiguos se suelen retrasar en el pago del préstamo.

  2. La dificultad en el pago del préstamo parece aumentar en los clientes con un score más bajo según la variable EXT_SOURCE_1, EXT_SOURCE_2 Y EXT_SOURCE_3 correspondiente a un score normalizado de una fuente de datos externa.

  3. Los clientes con materiales de madera en las paredes de sus viviendas son los más propensos a retrasarse en el pago del préstamo.

  4. Los clientes que tienen puestos de trabajo menos cualificados (low-skill laborers, drivers, waiters) presentan mayor probabilidad de retrasarse en el pago del préstamo.

  5. Conforme aumenta el número de consultas de crédito antes de la solicitud del préstamo (AMT_REQ_CREDIT_BUREAU), más aumenta la probabilidad de que se retrase en la devolución del mismo.

  6. Cuanto mayor es el tamaño de la familia del cliente más probabilidad en que se retrase en alguno de los pagos del préstamo.

  7. Se puede observar que si el cliente cambió de teléfono móvil (DAYS_LAST_PHONE_CHANGE) hace relativamente poco tiempo, aumenta la probabilidad de que pueda tener dificultades en el pago del préstamo.

  8. Los hombres son más propensos que las mujeres a tener dificultades en el pago del préstamo (CODE_GENDER).

  9. Cuanto mayor sea la cantidad de hijos que tiene el cliente, mayor será la dificultad de pago que tendrá (CNT_CHILDREN).

  10. Los clientes de baja por maternidad o desempleados son más propensos a tener dificultad en el pago del préstamo (NAME_INCOME_TYPE).

  11. Los clientes con una mayor educación son menos propensos a tener dificultades a la hora de devolver el préstamo (NAME_EDUCATION_TYPE).

  12. Parece que cuanto más jóven es el cliente (DAYS_BIRTH) tendrá más dificultades para el pago del préstamo.

  13. Los clientes que cambiaron su documento de ID poco antes de solicitar el préstamo (DAYS_ID_PUBLISH), además de si cambió su registro (DAYS_REGISTRATION) poco antes de la solicitud del préstamo, tendrá más dificultades para el pago del mismo.

  14. Cuanto mayor es el score de la región donde vive el cliente (REGION_RATING_CLIENT), mayor es la probabilidad de que tenga dificultades para el pago del préstamo.

  15. Los clientes que dieron el FLAG_DOCUMENT_2 tienen mayor probabilidad de tener dificultades en el pago del préstamo.

Tratamiento de variables continuas¶

Tratamiento de outliers¶

A continuación, vamos a visualizar el porcentaje de valores atípicos o outliers de cada variable. Decidiré si tratarlos o no dependiendo de su importancia y de su cantidad.

In [16]:
f_aux.get_deviation_of_mean_perc(df_loan_train, list_var_continuous, target='TARGET', multiplier=3)
Out[16]:
0.0 1.0 variable sum_outlier_values porcentaje_sum_null_values
0 0.945266 0.054734 COMMONAREA_AVG 1352 0.005496
1 0.945255 0.054745 COMMONAREA_MEDI 1370 0.005569
2 0.944238 0.055762 COMMONAREA_MODE 1345 0.005467
3 0.938272 0.061728 NONLIVINGAPARTMENTS_AVG 567 0.002305
4 0.936283 0.063717 NONLIVINGAPARTMENTS_MEDI 565 0.002297
5 0.930189 0.069811 NONLIVINGAPARTMENTS_MODE 530 0.002154
6 0.948006 0.051994 LIVINGAPARTMENTS_MEDI 1404 0.005707
7 0.950216 0.049784 LIVINGAPARTMENTS_AVG 1386 0.005634
8 0.946191 0.053809 LIVINGAPARTMENTS_MODE 1431 0.005817
9 0.960894 0.039106 FLOORSMIN_MODE 358 0.001455
10 0.963801 0.036199 FLOORSMIN_AVG 442 0.001797
11 0.961446 0.038554 FLOORSMIN_MEDI 415 0.001687
12 0.928423 0.071577 YEARS_BUILD_MODE 964 0.003919
13 0.928721 0.071279 YEARS_BUILD_MEDI 954 0.003878
14 0.928346 0.071654 YEARS_BUILD_AVG 949 0.003858
15 0.918652 0.081348 OWN_CAR_AGE 2729 0.011093
16 0.937428 0.062572 LANDAREA_MEDI 1726 0.007016
17 0.936047 0.063953 LANDAREA_AVG 1720 0.006992
18 0.934682 0.065318 LANDAREA_MODE 1730 0.007032
19 0.943553 0.056447 BASEMENTAREA_MODE 1683 0.006841
20 0.946284 0.053716 BASEMENTAREA_AVG 1601 0.006508
21 0.946317 0.053683 BASEMENTAREA_MEDI 1602 0.006512
22 0.946502 0.053498 NONLIVINGAREA_AVG 1944 0.007902
23 0.946383 0.053617 NONLIVINGAREA_MODE 1977 0.008036
24 0.947315 0.052685 NONLIVINGAREA_MEDI 1955 0.007947
25 0.952972 0.047028 ELEVATORS_MEDI 1935 0.007866
26 0.953189 0.046811 ELEVATORS_AVG 1944 0.007902
27 0.949737 0.050263 ELEVATORS_MODE 2666 0.010837
28 0.947811 0.052189 APARTMENTS_AVG 2376 0.009658
29 0.947083 0.052917 APARTMENTS_MODE 2400 0.009756
30 0.947107 0.052893 APARTMENTS_MEDI 2420 0.009837
31 0.933819 0.066181 ENTRANCES_MEDI 1783 0.007248
32 0.936636 0.063364 ENTRANCES_MODE 2099 0.008532
33 0.934574 0.065426 ENTRANCES_AVG 1773 0.007207
34 0.948486 0.051514 LIVINGAREA_AVG 2543 0.010337
35 0.949210 0.050790 LIVINGAREA_MODE 2658 0.010805
36 0.949708 0.050292 LIVINGAREA_MEDI 2565 0.010426
37 0.957457 0.042543 FLOORSMAX_MODE 2092 0.008504
38 0.957170 0.042830 FLOORSMAX_AVG 2078 0.008447
39 0.956722 0.043278 FLOORSMAX_MEDI 2172 0.008829
40 0.907473 0.092527 YEARS_BEGINEXPLUATATION_MODE 562 0.002284
41 0.909408 0.090592 YEARS_BEGINEXPLUATATION_AVG 574 0.002333
42 0.905204 0.094796 YEARS_BEGINEXPLUATATION_MEDI 538 0.002187
43 0.954511 0.045489 TOTALAREA_MODE 2660 0.010813
44 0.919719 0.080281 AMT_REQ_CREDIT_BUREAU_WEEK 6826 0.027747
45 0.944912 0.055088 AMT_REQ_CREDIT_BUREAU_MON 2614 0.010626
46 0.921254 0.078746 AMT_REQ_CREDIT_BUREAU_HOUR 1308 0.005317
47 0.902357 0.097643 AMT_REQ_CREDIT_BUREAU_DAY 1188 0.004829
48 0.904549 0.095451 AMT_REQ_CREDIT_BUREAU_YEAR 2682 0.010902
49 0.910334 0.089666 AMT_REQ_CREDIT_BUREAU_QRT 1829 0.007435
50 0.911629 0.088371 OBS_60_CNT_SOCIAL_CIRCLE 4764 0.019365
51 0.876263 0.123737 DEF_60_CNT_SOCIAL_CIRCLE 3168 0.012878
52 0.911423 0.088577 OBS_30_CNT_SOCIAL_CIRCLE 4911 0.019963
53 0.884343 0.115657 DEF_30_CNT_SOCIAL_CIRCLE 5499 0.022353
54 0.960193 0.039807 AMT_GOODS_PRICE 3316 0.013479
55 0.962063 0.037937 AMT_ANNUITY 2346 0.009536
56 0.899433 0.100567 CNT_FAM_MEMBERS 3172 0.012894
57 0.949807 0.050193 DAYS_LAST_PHONE_CHANGE 518 0.002106
58 0.957110 0.042890 AMT_CREDIT 2588 0.010520
59 0.948113 0.051887 AMT_INCOME_TOTAL 212 0.000862
60 0.961561 0.038439 REGION_POPULATION_RELATIVE 6712 0.027284
61 0.962585 0.037415 DAYS_REGISTRATION 588 0.002390
  • Las variables a destacar son 'AMT_CREDIT' siendo la cantidad total de dinero prestado al cliente y 'AMT_INCOME_TOTAL' siendo el ingreso total del cliente, pues estos valores pueden representar una importancia relativa en la variable 'TARGET'. Si tenemos en cuenta que el valor de nuestra variable target es que exista aproximadamente un 8% de dificultad de pago, no tendremos que preocuparnos por la cantidad de outliers que tenemos. La cantidad de outliers habrá que tenerla en cuenta pero a priori no deberían de afectar a las conclusiones finales debido a la cantidad tan reducida.

En otra instancia, destacar que los porcentajes de outliers son muy bajos prácticamente en todas las variables y no deberían de afectar significativamente a los resultados por lo que, por ahora procederé a mantenerlos.

Análisis de correlación entre las variables¶

Estudiaré las correlaciones por separado para las variables numéricas, categóricas y booleanas, pues para las variables numéricas se estudiará la correlación mediante el método de pearson y para las variables categóricas y booleanas se utilizará la V de Cramers. Combinaré el uso de funciones propias con funciones vistas en el máster.

Matriz de correlación para variables numéricas¶

In [19]:
corr = pd.concat([df_loan_train.select_dtypes('number').drop(df_loan_bool, axis=1), df_loan_train['TARGET']], axis=1).corr(method='pearson')
corr
Out[19]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_MODE FLOORSMIN_AVG FLOORSMIN_MEDI YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_MEDI ELEVATORS_AVG ELEVATORS_MODE APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_MEDI ENTRANCES_MODE ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI FLOORSMAX_MODE FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EXT_SOURCE_3 AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_YEAR AMT_REQ_CREDIT_BUREAU_QRT OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY CNT_FAM_MEMBERS DAYS_LAST_PHONE_CHANGE HOUR_APPR_PROCESS_START AMT_CREDIT AMT_INCOME_TOTAL CNT_CHILDREN REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY TARGET
SK_ID_CURR 1.000000 -0.000618 -0.000306 -0.000235 -0.004044 -0.004488 -0.003741 0.003474 0.003429 0.004078 0.003772 0.004766 0.004513 0.007333 0.007869 0.008118 0.000983 0.003210 0.002934 0.003035 -0.000687 -0.001218 -0.001100 -0.000098 0.002460 0.001556 0.001669 0.005666 0.005552 0.005788 0.001911 0.002158 0.002255 -0.002076 -0.002179 -0.002377 0.003940 0.004250 0.004374 0.005201 0.005760 0.005355 0.002445 0.002513 0.002298 0.003307 -0.000007 0.001299 0.000227 -0.002844 -0.001018 0.004930 -0.000050 -0.001489 0.000678 -0.001404 -0.000575 0.001123 0.000227 -0.000003 -0.002231 0.000776 0.000205 0.000214 -0.001795 -0.000688 0.001271 -0.000841 0.001274 -0.000630 -0.000887 -0.001853 -0.001741 -0.000581
COMMONAREA_AVG -0.000618 1.000000 0.995723 0.976990 0.104080 0.103623 0.101982 0.532262 0.530703 0.523296 0.287190 0.294760 0.294020 0.226436 0.229366 0.229621 -0.038274 0.254899 0.253077 0.240619 0.383025 0.401366 0.400316 0.032502 0.227503 0.215756 0.227223 0.518537 0.520095 0.501695 0.536826 0.511312 0.536078 0.322824 0.299515 0.325433 0.544066 0.519428 0.542972 0.395279 0.401736 0.400223 0.050956 0.095025 0.078857 0.550656 -0.005499 -0.009497 0.022451 0.006416 -0.000265 -0.014661 -0.010515 -0.020677 -0.014209 -0.021039 -0.012428 0.053179 0.049932 0.056695 0.000262 -0.002659 0.047662 0.049198 0.086203 -0.000503 0.168101 0.006585 -0.008967 0.024592 -0.000485 -0.120701 -0.130876 -0.021858
COMMONAREA_MEDI -0.000306 0.995723 1.000000 0.980186 0.104108 0.104648 0.103190 0.535049 0.531468 0.526520 0.285829 0.293205 0.292670 0.227598 0.230334 0.230275 -0.037959 0.257784 0.255666 0.244055 0.386034 0.402608 0.402752 0.031532 0.227796 0.217994 0.229016 0.520443 0.520169 0.503396 0.537877 0.514492 0.539034 0.325659 0.302568 0.327092 0.545263 0.522608 0.545882 0.394667 0.400655 0.399626 0.051044 0.095260 0.079089 0.550483 -0.005625 -0.009552 0.022149 0.006569 -0.000085 -0.014401 -0.010050 -0.020014 -0.013928 -0.020368 -0.012346 0.051516 0.048917 0.055852 0.000731 -0.002478 0.046151 0.048203 0.084201 -0.000145 0.163327 0.007296 -0.009276 0.025303 -0.000236 -0.117366 -0.127754 -0.021818
COMMONAREA_MODE -0.000235 0.976990 0.980186 1.000000 0.101379 0.102925 0.106578 0.524858 0.520722 0.535876 0.276881 0.275963 0.275386 0.224705 0.221103 0.221461 -0.032890 0.266620 0.263642 0.262352 0.402390 0.400002 0.402012 0.027070 0.220478 0.227433 0.224863 0.505053 0.503870 0.504118 0.527508 0.524077 0.529621 0.332168 0.321489 0.332897 0.534595 0.533735 0.536113 0.378377 0.376467 0.375441 0.049195 0.090068 0.074009 0.541181 -0.004424 -0.008405 0.019809 0.006513 0.000204 -0.013372 -0.009280 -0.016636 -0.013215 -0.016998 -0.011801 0.043665 0.041974 0.047572 0.000838 -0.000391 0.040003 0.041446 0.072656 -0.000906 0.134159 0.007584 -0.009378 0.025497 -0.000491 -0.095498 -0.107276 -0.019588
NONLIVINGAPARTMENTS_AVG -0.004044 0.104080 0.104108 0.101379 1.000000 0.988800 0.968168 0.155765 0.160672 0.142623 0.069839 0.073014 0.072611 0.071022 0.071933 0.072432 -0.027417 0.065266 0.063079 0.059149 0.091922 0.096333 0.095775 0.017335 0.217959 0.208738 0.216835 0.121778 0.121878 0.114281 0.196310 0.181238 0.192000 0.061096 0.052706 0.061623 0.136229 0.127699 0.135458 0.108526 0.113893 0.112877 0.020760 0.035872 0.032569 0.144837 0.009442 -0.003654 -0.000560 0.000469 -0.001643 0.001379 0.002805 -0.001056 -0.001319 -0.001377 0.001349 0.019233 0.014541 0.022276 0.002755 0.001123 0.014680 0.013413 0.030406 0.004179 0.024268 0.000849 -0.002721 0.035364 -0.008094 -0.018347 -0.021329 -0.003702
NONLIVINGAPARTMENTS_MEDI -0.004488 0.103623 0.104648 0.102925 0.988800 1.000000 0.979302 0.156919 0.155997 0.144498 0.068459 0.071046 0.071093 0.069883 0.070533 0.070675 -0.026958 0.062342 0.061601 0.057166 0.093564 0.096266 0.096361 0.016615 0.218406 0.211433 0.218793 0.121835 0.121147 0.115012 0.194906 0.184346 0.193856 0.062836 0.055111 0.062890 0.136504 0.129633 0.136379 0.107229 0.111649 0.111432 0.020289 0.034919 0.031826 0.144587 0.008861 -0.003997 -0.000965 0.000675 -0.001680 0.001970 0.003295 -0.000561 -0.000911 -0.000880 0.001888 0.018113 0.013412 0.021405 0.003062 0.001182 0.014174 0.012401 0.028913 0.004442 0.021699 0.000777 -0.002782 0.034240 -0.007466 -0.015891 -0.019139 -0.002904
NONLIVINGAPARTMENTS_MODE -0.003741 0.101982 0.103190 0.106578 0.968168 0.979302 1.000000 0.146722 0.145581 0.146565 0.067889 0.066070 0.066105 0.067812 0.066010 0.066057 -0.024587 0.062815 0.061931 0.062180 0.098268 0.095715 0.096317 0.015245 0.212423 0.214904 0.213502 0.116300 0.115512 0.115426 0.189573 0.186793 0.188568 0.065015 0.061913 0.064713 0.131247 0.132183 0.131377 0.101441 0.102779 0.102801 0.019254 0.032312 0.029473 0.139331 0.008848 -0.004205 -0.001375 -0.000420 -0.001305 0.002258 0.003143 -0.000231 -0.000246 -0.000553 0.003069 0.016875 0.010851 0.017211 0.002576 0.000882 0.012107 0.010076 0.025624 0.004294 0.016331 0.001163 -0.003421 0.032723 -0.007737 -0.010272 -0.014123 -0.001785
LIVINGAPARTMENTS_MEDI 0.003474 0.532262 0.535049 0.524858 0.155765 0.156919 0.146722 1.000000 0.993444 0.975784 0.433319 0.440796 0.439792 0.332742 0.333236 0.334181 -0.049922 0.425089 0.421164 0.415587 0.629270 0.650066 0.651839 0.043665 0.292987 0.276488 0.292079 0.816285 0.814531 0.801272 0.943828 0.916230 0.944156 0.567007 0.537489 0.568360 0.884652 0.858999 0.886539 0.584335 0.590479 0.588101 0.088925 0.153387 0.131092 0.847531 0.000900 -0.007432 0.032529 0.002651 0.003484 -0.013095 -0.008347 -0.028310 -0.016995 -0.028816 -0.015635 0.078604 0.061198 0.074110 -0.004163 -0.002901 0.078353 0.058731 0.105237 -0.005822 0.190426 0.013687 -0.020043 0.025284 0.000204 -0.152176 -0.176999 -0.025916
LIVINGAPARTMENTS_AVG 0.003429 0.530703 0.531468 0.520722 0.160672 0.155997 0.145581 0.993444 1.000000 0.970003 0.432860 0.441105 0.439306 0.330950 0.331908 0.333106 -0.050750 0.420759 0.417510 0.410504 0.624144 0.647704 0.646803 0.045369 0.292061 0.273629 0.289557 0.811084 0.813447 0.795752 0.945602 0.909630 0.936836 0.561134 0.531457 0.565461 0.881894 0.852971 0.879724 0.584088 0.591459 0.588124 0.088665 0.152964 0.130738 0.849248 0.001055 -0.007485 0.032595 0.002833 0.003390 -0.012730 -0.008789 -0.028427 -0.017006 -0.028928 -0.015667 0.080303 0.062989 0.076515 -0.004810 -0.003382 0.079959 0.060508 0.107432 -0.006488 0.195956 0.013299 -0.020296 0.024839 0.000710 -0.156766 -0.181184 -0.026580
LIVINGAPARTMENTS_MODE 0.004078 0.523296 0.526520 0.535876 0.142623 0.144498 0.146565 0.975784 0.970003 1.000000 0.431111 0.428039 0.427364 0.331959 0.324816 0.325796 -0.044679 0.436411 0.431860 0.438350 0.653435 0.648962 0.651692 0.038305 0.284769 0.287368 0.286799 0.800515 0.798802 0.809301 0.931941 0.939327 0.933477 0.573767 0.566713 0.575143 0.874249 0.879962 0.875901 0.573404 0.569560 0.567508 0.087476 0.148304 0.126255 0.834733 0.001906 -0.007015 0.030218 0.003853 0.003741 -0.012366 -0.008217 -0.025142 -0.017523 -0.025629 -0.016124 0.071318 0.054533 0.065992 -0.004381 -0.003171 0.072238 0.052481 0.092782 -0.006230 0.164517 0.013336 -0.019826 0.023973 0.000049 -0.129571 -0.155851 -0.024955
FLOORSMIN_MODE 0.003772 0.287190 0.285829 0.276881 0.069839 0.068459 0.067889 0.433319 0.432860 0.431111 1.000000 0.986275 0.988711 0.354088 0.352507 0.352567 -0.073863 0.152698 0.150064 0.149188 0.207001 0.220236 0.217340 0.067297 0.147103 0.136329 0.143140 0.500231 0.500414 0.496078 0.437226 0.424276 0.435556 0.034670 0.028875 0.037452 0.458830 0.444623 0.457790 0.727696 0.723655 0.724492 0.100572 0.168074 0.148876 0.446324 0.003778 -0.001291 0.035653 0.003737 0.003338 -0.008855 -0.004238 -0.035979 -0.022872 -0.036522 -0.025390 0.106986 0.076515 0.094729 -0.001186 -0.006971 0.113720 0.074611 0.130492 -0.009376 0.273877 0.000420 -0.013644 0.019499 -0.009859 -0.215123 -0.222929 -0.033119
FLOORSMIN_AVG 0.004766 0.294760 0.293205 0.275963 0.073014 0.071046 0.066070 0.440796 0.441105 0.428039 0.986275 1.000000 0.997300 0.352802 0.358981 0.359817 -0.076332 0.150093 0.147504 0.139141 0.199227 0.222760 0.219260 0.070879 0.153013 0.131155 0.146666 0.510074 0.511838 0.496145 0.445280 0.419621 0.442561 0.031725 0.016065 0.034497 0.467477 0.440933 0.465169 0.730044 0.743030 0.740669 0.101034 0.172300 0.152133 0.456486 0.002409 -0.001575 0.039477 0.003833 0.003686 -0.010269 -0.004978 -0.038168 -0.023657 -0.038671 -0.026169 0.112450 0.080338 0.100250 -0.002877 -0.007270 0.119442 0.078129 0.139013 -0.010143 0.292362 0.001133 -0.014006 0.020757 -0.009386 -0.229994 -0.236985 -0.033705
FLOORSMIN_MEDI 0.004513 0.294020 0.292670 0.275386 0.072611 0.071093 0.066105 0.439792 0.439306 0.427364 0.988711 0.997300 1.000000 0.353089 0.359400 0.359322 -0.076610 0.150355 0.147874 0.139709 0.198881 0.221369 0.218122 0.069767 0.152203 0.131145 0.146481 0.509386 0.509984 0.495428 0.443479 0.418987 0.441581 0.030663 0.015810 0.033887 0.465766 0.440049 0.464294 0.730901 0.740699 0.741322 0.100881 0.171914 0.152238 0.454403 0.002280 -0.000978 0.038721 0.003881 0.003681 -0.010540 -0.004967 -0.037967 -0.023556 -0.038457 -0.026158 0.111551 0.079628 0.098972 -0.002178 -0.007243 0.118550 0.077513 0.137605 -0.009670 0.288614 0.001302 -0.014512 0.020821 -0.009253 -0.227258 -0.234200 -0.033636
YEARS_BUILD_MODE 0.007333 0.226436 0.227598 0.224705 0.071022 0.069883 0.067812 0.332742 0.330950 0.331959 0.354088 0.352802 0.353089 1.000000 0.989634 0.989766 -0.043684 0.183206 0.181211 0.177669 0.243382 0.248353 0.247039 0.013757 0.125128 0.117844 0.124120 0.339740 0.338728 0.336509 0.337593 0.328968 0.337040 0.091311 0.085704 0.092882 0.352223 0.344393 0.352294 0.510358 0.508150 0.508380 0.302129 0.492266 0.438762 0.355397 0.014674 -0.006569 -0.004297 0.001198 0.001962 -0.020694 -0.006423 0.001401 -0.011099 0.001537 -0.010162 0.007695 0.038318 0.030641 0.041360 0.011749 -0.016409 0.033075 0.038279 0.029196 -0.064028 0.025823 -0.006851 0.163429 -0.009393 0.048298 0.040781 -0.025586
YEARS_BUILD_MEDI 0.007869 0.229366 0.230334 0.221103 0.071933 0.070533 0.066010 0.333236 0.331908 0.324816 0.352507 0.358981 0.359400 0.989634 1.000000 0.998634 -0.044703 0.179507 0.178186 0.168604 0.232960 0.246902 0.244974 0.014411 0.127607 0.112985 0.124902 0.342223 0.341850 0.333069 0.338268 0.321492 0.337094 0.085301 0.072267 0.087948 0.353487 0.337070 0.352896 0.511288 0.517014 0.517300 0.299885 0.497321 0.443892 0.357755 0.015024 -0.006244 -0.004164 0.001142 0.003460 -0.021299 -0.007438 0.000646 -0.011636 0.000839 -0.010555 0.010393 0.039981 0.032850 0.041839 0.011615 -0.014470 0.034655 0.042482 0.029595 -0.058163 0.027171 -0.007974 0.164861 -0.009253 0.043189 0.036414 -0.025933
YEARS_BUILD_AVG 0.008118 0.229621 0.230275 0.221461 0.072432 0.070675 0.066057 0.334181 0.333106 0.325796 0.352567 0.359817 0.359322 0.989766 0.998634 1.000000 -0.044935 0.179786 0.178408 0.168751 0.233523 0.247753 0.245664 0.014988 0.127629 0.112896 0.124702 0.342611 0.342693 0.333591 0.339399 0.322318 0.337949 0.086125 0.072993 0.088597 0.354490 0.337875 0.353607 0.511258 0.518305 0.517313 0.299906 0.497986 0.443345 0.359051 0.015181 -0.006283 -0.004172 0.001230 0.003057 -0.021440 -0.007304 0.000507 -0.011478 0.000709 -0.010424 0.010791 0.040326 0.033351 0.041869 0.011920 -0.014282 0.034931 0.042782 0.029646 -0.057069 0.026899 -0.007603 0.165196 -0.009454 0.042167 0.035435 -0.025685
OWN_CAR_AGE 0.000983 -0.038274 -0.037959 -0.032890 -0.027417 -0.026958 -0.024587 -0.049922 -0.050750 -0.044679 -0.073863 -0.076332 -0.076610 -0.043684 -0.044703 -0.044935 1.000000 -0.021395 -0.021384 -0.019544 -0.026777 -0.032436 -0.031287 -0.081396 -0.032484 -0.028765 -0.032077 -0.065794 -0.066436 -0.061457 -0.051160 -0.045620 -0.049992 -0.016462 -0.012329 -0.017163 -0.059801 -0.054950 -0.058606 -0.080548 -0.082869 -0.082545 0.001837 -0.000012 0.000043 -0.061077 -0.013837 0.003276 -0.022521 0.003907 -0.006480 -0.015641 -0.017527 0.005161 0.011677 0.005222 0.007421 -0.081239 -0.106258 -0.099371 -0.015176 0.002689 -0.069504 -0.096874 -0.119654 0.009539 -0.082891 0.007699 0.028075 -0.025165 0.008747 0.086297 0.087654 0.039531
LANDAREA_MEDI 0.003210 0.254899 0.257784 0.266620 0.065266 0.062342 0.062815 0.425089 0.420759 0.436411 0.152698 0.150093 0.150355 0.183206 0.179507 0.179786 -0.021395 1.000000 0.990884 0.981228 0.475542 0.471256 0.472674 0.005099 0.161373 0.162151 0.164334 0.378455 0.376991 0.380360 0.498729 0.500957 0.500756 0.511590 0.502198 0.512221 0.503788 0.505662 0.504540 0.220507 0.217760 0.217653 0.054186 0.076599 0.071351 0.493214 0.009260 0.005231 0.011826 -0.001021 0.005569 -0.011681 0.006480 -0.003551 -0.001748 -0.003813 -0.002895 0.021615 0.011375 0.005896 0.000430 -0.000237 0.014274 0.004690 -0.002390 -0.004147 -0.053101 0.004539 -0.011408 0.003442 -0.005515 0.046965 0.037945 -0.013984
LANDAREA_AVG 0.002934 0.253077 0.255666 0.263642 0.063079 0.061601 0.061931 0.421164 0.417510 0.431860 0.150064 0.147504 0.147874 0.181211 0.178186 0.178408 -0.021384 0.990884 1.000000 0.972972 0.470694 0.468224 0.469202 0.005084 0.160731 0.160244 0.162543 0.375876 0.375158 0.377513 0.495934 0.496410 0.496969 0.507562 0.497818 0.508848 0.501159 0.501364 0.501161 0.219714 0.216961 0.216819 0.053952 0.076331 0.071097 0.491015 0.009236 0.007634 0.012075 -0.001104 0.005682 -0.012393 0.006054 -0.003694 -0.001492 -0.003964 -0.002509 0.022506 0.011802 0.006374 0.000102 0.000591 0.014503 0.005175 -0.002143 -0.004457 -0.051987 0.004210 -0.011420 0.003438 -0.005355 0.045123 0.036342 -0.013539
LANDAREA_MODE 0.003035 0.240619 0.244055 0.262352 0.059149 0.057166 0.062180 0.415587 0.410504 0.438350 0.149188 0.139141 0.139709 0.177669 0.168604 0.168751 -0.019544 0.981228 0.972972 1.000000 0.484460 0.464144 0.466461 0.003333 0.154759 0.168565 0.159405 0.364891 0.362521 0.380411 0.487670 0.508529 0.490293 0.511949 0.518244 0.511956 0.491731 0.513547 0.493367 0.212257 0.202091 0.202247 0.052933 0.072452 0.067231 0.479343 0.008100 0.005646 0.010784 -0.000234 0.005862 -0.010501 0.006728 -0.002552 -0.002895 -0.002832 -0.003729 0.017290 0.007835 0.001457 0.001572 -0.000183 0.011613 0.001402 -0.004020 -0.003953 -0.061096 0.004763 -0.010425 0.004006 -0.005961 0.058796 0.048524 -0.012519
BASEMENTAREA_MODE -0.000687 0.383025 0.386034 0.402390 0.091922 0.093564 0.098268 0.629270 0.624144 0.653435 0.207001 0.199227 0.198881 0.243382 0.232960 0.233523 -0.026777 0.475542 0.470694 0.484460 1.000000 0.975291 0.978262 0.033933 0.254772 0.270229 0.259540 0.539354 0.538022 0.552293 0.660366 0.678423 0.662998 0.651956 0.653784 0.652995 0.673436 0.690212 0.672895 0.308825 0.298168 0.297787 0.059862 0.083918 0.076458 0.648240 0.004110 -0.002767 0.019158 -0.000325 0.004118 -0.011166 -0.002863 -0.010674 -0.011675 -0.011010 -0.009459 0.037158 0.037724 0.036378 -0.004981 -0.005732 0.034527 0.033595 0.011618 -0.009291 0.066314 -0.002691 -0.000176 -0.018812 -0.011839 -0.032146 -0.046738 -0.021323
BASEMENTAREA_AVG -0.001218 0.401366 0.402608 0.400002 0.096333 0.096266 0.095715 0.650066 0.647704 0.648962 0.220236 0.222760 0.221369 0.248353 0.246902 0.247753 -0.032436 0.471256 0.468224 0.464144 0.975291 1.000000 0.995783 0.039123 0.263714 0.258374 0.262911 0.561617 0.563952 0.554458 0.679918 0.667089 0.678909 0.647729 0.627240 0.651806 0.693521 0.678494 0.690215 0.328630 0.329492 0.327642 0.061477 0.089229 0.081782 0.673316 0.005423 -0.002262 0.020907 -0.001259 0.004760 -0.012728 -0.003567 -0.015154 -0.013251 -0.015466 -0.010879 0.047843 0.045509 0.046552 -0.005527 -0.006458 0.041399 0.041226 0.015454 -0.009050 0.098987 -0.002384 -0.001224 -0.020079 -0.012849 -0.061396 -0.074168 -0.023834
BASEMENTAREA_MEDI -0.001100 0.400316 0.402752 0.402012 0.095775 0.096361 0.096317 0.651839 0.646803 0.651692 0.217340 0.219260 0.218122 0.247039 0.244974 0.245664 -0.031287 0.472674 0.469202 0.466461 0.978262 0.995783 1.000000 0.038267 0.262984 0.260387 0.264363 0.561737 0.561484 0.554872 0.678683 0.668909 0.680415 0.651333 0.631156 0.652724 0.692532 0.680624 0.691541 0.325050 0.325273 0.323735 0.060960 0.088566 0.081095 0.669533 0.005378 -0.002666 0.021017 -0.001086 0.005041 -0.012201 -0.003754 -0.014443 -0.012850 -0.014743 -0.010474 0.046458 0.043617 0.044472 -0.005794 -0.007090 0.040947 0.039395 0.014711 -0.009238 0.094199 -0.002358 -0.001120 -0.020656 -0.013046 -0.057318 -0.070167 -0.023122
EXT_SOURCE_1 -0.000098 0.032502 0.031532 0.027070 0.017335 0.016615 0.015245 0.043665 0.045369 0.038305 0.067297 0.070879 0.069767 0.013757 0.014411 0.014988 -0.081396 0.005099 0.005084 0.003333 0.033933 0.039123 0.038267 1.000000 0.030153 0.024627 0.028812 0.070254 0.071731 0.066530 0.051169 0.045355 0.049499 0.019566 0.016316 0.020374 0.065437 0.060149 0.064241 0.086689 0.089897 0.088767 -0.002320 -0.001223 -0.000907 0.063785 0.185211 -0.002503 0.031976 -0.006640 -0.004104 0.005301 -0.002403 -0.026333 -0.030973 -0.026887 -0.028715 0.213917 0.174615 0.119410 -0.096102 -0.130211 0.032487 0.167599 0.023251 -0.138459 0.098941 -0.598890 0.289068 -0.178719 -0.132527 -0.113677 -0.113373 -0.155781
NONLIVINGAREA_AVG 0.002460 0.227503 0.227796 0.220478 0.217959 0.218406 0.212423 0.292987 0.292061 0.284769 0.147103 0.153013 0.152203 0.125128 0.127607 0.127629 -0.032484 0.161373 0.160731 0.154759 0.254772 0.263714 0.262984 0.030153 1.000000 0.966617 0.990679 0.279937 0.282617 0.274017 0.298349 0.285919 0.295956 0.161830 0.155001 0.164597 0.300217 0.285037 0.296622 0.248043 0.253370 0.252478 -0.008654 0.012008 0.013086 0.365713 -0.002831 -0.007740 0.012384 0.002492 0.001485 -0.009466 -0.002690 -0.017470 -0.013272 -0.017583 -0.013243 0.045519 0.044956 0.054684 0.004982 -0.004054 0.044565 0.040894 0.077089 0.003166 0.076143 0.004914 -0.014019 0.052079 0.001327 -0.082002 -0.082463 -0.012034
NONLIVINGAREA_MODE 0.001556 0.215756 0.217994 0.227433 0.208738 0.211433 0.214904 0.276488 0.273629 0.287368 0.136329 0.131155 0.131145 0.117844 0.112985 0.112896 -0.028765 0.162151 0.160244 0.168565 0.270229 0.258374 0.260387 0.024627 0.966617 1.000000 0.976036 0.264544 0.263841 0.273111 0.282916 0.292644 0.283986 0.169134 0.175061 0.169416 0.283470 0.293653 0.282900 0.231310 0.225307 0.225592 -0.004004 0.010092 0.008034 0.345283 -0.003298 -0.007128 0.009529 0.002035 0.000482 -0.007350 -0.001136 -0.013123 -0.012010 -0.013260 -0.011674 0.037709 0.039309 0.045902 0.005223 -0.004427 0.038635 0.035297 0.064064 0.002664 0.051838 0.004090 -0.012948 0.049933 0.000026 -0.059503 -0.060918 -0.010751
NONLIVINGAREA_MEDI 0.001669 0.227223 0.229016 0.224863 0.216835 0.218793 0.213502 0.292079 0.289557 0.286799 0.143140 0.146666 0.146481 0.124120 0.124902 0.124702 -0.032077 0.164334 0.162543 0.159405 0.259540 0.262911 0.264363 0.028812 0.990679 0.976036 1.000000 0.279554 0.279643 0.274803 0.296336 0.288648 0.296514 0.164696 0.159249 0.165812 0.298327 0.288660 0.296795 0.243800 0.247064 0.247169 -0.009434 0.010814 0.012117 0.360934 -0.003549 -0.007969 0.011464 0.001938 0.000551 -0.008823 -0.002270 -0.015676 -0.012892 -0.015779 -0.012713 0.043267 0.042839 0.051977 0.005025 -0.004454 0.043368 0.038741 0.073302 0.003049 0.067917 0.005578 -0.014081 0.052898 0.001627 -0.075075 -0.075988 -0.011442
ELEVATORS_MEDI 0.005666 0.518537 0.520443 0.505053 0.121778 0.121835 0.116300 0.816285 0.811084 0.800515 0.500231 0.510074 0.509386 0.339740 0.342223 0.342611 -0.065794 0.378455 0.375876 0.364891 0.539354 0.561617 0.561737 0.070254 0.279937 0.264544 0.279554 1.000000 0.995951 0.982569 0.834284 0.807492 0.836612 0.403318 0.378022 0.403711 0.865624 0.840316 0.868147 0.669392 0.676676 0.676016 0.073841 0.079690 0.078694 0.838507 0.006905 -0.003133 0.040722 0.000570 0.002988 -0.016773 -0.004685 -0.034902 -0.023556 -0.035381 -0.022742 0.113715 0.083950 0.102732 0.000133 -0.011752 0.105367 0.081052 0.039690 -0.005835 0.275174 -0.000223 -0.008678 0.000790 -0.010731 -0.221633 -0.233193 -0.035791
ELEVATORS_AVG 0.005552 0.520095 0.520169 0.503870 0.121878 0.121147 0.115512 0.814531 0.813447 0.798802 0.500414 0.511838 0.509984 0.338728 0.341850 0.342693 -0.066436 0.376991 0.375158 0.362521 0.538022 0.563952 0.561484 0.071731 0.282617 0.263841 0.279643 0.995951 1.000000 0.978454 0.836059 0.804616 0.833627 0.400296 0.374157 0.403905 0.867534 0.838036 0.865341 0.671129 0.680446 0.678042 0.073437 0.079682 0.078568 0.845008 0.006803 -0.003030 0.040755 0.000927 0.003282 -0.017063 -0.005053 -0.035805 -0.024005 -0.036295 -0.023109 0.115388 0.085197 0.104433 -0.000277 -0.011418 0.106407 0.082385 0.040491 -0.006032 0.281380 -0.000371 -0.008651 -0.000080 -0.010767 -0.227037 -0.238425 -0.036381
ELEVATORS_MODE 0.005788 0.501695 0.503396 0.504118 0.114281 0.115012 0.115426 0.801272 0.795752 0.809301 0.496078 0.496145 0.495428 0.336509 0.333069 0.333591 -0.061457 0.380360 0.377513 0.380411 0.552293 0.554458 0.554872 0.066530 0.274017 0.273111 0.274803 0.982569 0.978454 1.000000 0.821441 0.825110 0.824538 0.402745 0.401050 0.402711 0.851998 0.855616 0.855201 0.661175 0.656435 0.655700 0.077213 0.079978 0.079048 0.820160 0.006742 -0.002594 0.038436 0.000721 0.003161 -0.015946 -0.004350 -0.031987 -0.023458 -0.032493 -0.022422 0.106503 0.079855 0.096243 0.001385 -0.010413 0.099335 0.076927 0.036894 -0.005723 0.252585 -0.000107 -0.008199 0.001957 -0.010496 -0.201538 -0.213864 -0.034306
APARTMENTS_AVG 0.001911 0.536826 0.537877 0.527508 0.196310 0.194906 0.189573 0.943828 0.945602 0.931941 0.437226 0.445280 0.443479 0.337593 0.338268 0.339399 -0.051160 0.498729 0.495934 0.487670 0.660366 0.679918 0.678683 0.051169 0.298349 0.282916 0.296336 0.834284 0.836059 0.821441 1.000000 0.972824 0.995015 0.606101 0.581537 0.609724 0.914218 0.893655 0.913113 0.613988 0.618444 0.616228 0.096675 0.101424 0.100973 0.892090 0.003258 -0.003478 0.034102 0.001789 0.004611 -0.015733 -0.002850 -0.024016 -0.016403 -0.024522 -0.013851 0.090343 0.067394 0.079158 -0.010062 -0.008792 0.083651 0.063280 0.031310 -0.012330 0.206390 0.006776 -0.017006 0.013472 -0.006499 -0.152610 -0.172048 -0.031644
APARTMENTS_MODE 0.002158 0.511312 0.514492 0.524077 0.181238 0.184346 0.186793 0.916230 0.909630 0.939327 0.424276 0.419621 0.418987 0.328968 0.321492 0.322318 -0.045620 0.500957 0.496410 0.508529 0.678423 0.667089 0.668909 0.045355 0.285919 0.292644 0.288648 0.807492 0.804616 0.825110 0.972824 1.000000 0.976870 0.610962 0.614574 0.610929 0.890888 0.911286 0.894377 0.595375 0.585385 0.584368 0.101931 0.101663 0.100944 0.862033 0.002695 -0.003112 0.031627 0.002413 0.004402 -0.013849 -0.002504 -0.020334 -0.016230 -0.020814 -0.013413 0.079769 0.059877 0.068941 -0.008183 -0.007959 0.074785 0.055799 0.027137 -0.011344 0.175433 0.006669 -0.015507 0.013142 -0.006220 -0.123397 -0.143996 -0.029427
APARTMENTS_MEDI 0.002255 0.536078 0.539034 0.529621 0.192000 0.193856 0.188568 0.944156 0.936836 0.933477 0.435556 0.442561 0.441581 0.337040 0.337094 0.337949 -0.049992 0.500756 0.496969 0.490293 0.662998 0.678909 0.680415 0.049499 0.295956 0.283986 0.296514 0.836612 0.833627 0.824538 0.995015 0.976870 1.000000 0.610141 0.586601 0.610304 0.913205 0.896485 0.916740 0.612051 0.614861 0.613871 0.096835 0.101344 0.101165 0.886104 0.002998 -0.003533 0.033987 0.001865 0.004587 -0.015310 -0.002789 -0.023605 -0.016286 -0.024106 -0.013698 0.088616 0.065768 0.077002 -0.009991 -0.009078 0.082497 0.061599 0.030678 -0.012264 0.201838 0.006985 -0.016718 0.013577 -0.006452 -0.148410 -0.167831 -0.031137
ENTRANCES_MEDI -0.002076 0.322824 0.325659 0.332168 0.061096 0.062836 0.065015 0.567007 0.561134 0.573767 0.034670 0.031725 0.030663 0.091311 0.085301 0.086125 -0.016462 0.511590 0.507562 0.511949 0.651956 0.647729 0.651333 0.019566 0.161830 0.169134 0.164696 0.403318 0.400296 0.402745 0.606101 0.610962 0.610141 1.000000 0.980457 0.996902 0.615481 0.622494 0.619575 0.086672 0.083234 0.081517 0.037591 0.041857 0.040780 0.587397 0.008871 -0.000142 0.013349 -0.002721 0.006788 -0.010360 -0.000025 0.000122 -0.004201 -0.000143 -0.000596 0.031061 0.017277 0.012701 -0.003046 -0.012220 0.021172 0.013505 0.004576 -0.006975 0.033167 -0.008534 0.002773 -0.062268 -0.013221 -0.021531 -0.028446 -0.020116
ENTRANCES_MODE -0.002179 0.299515 0.302568 0.321489 0.052706 0.055111 0.061913 0.537489 0.531457 0.566713 0.028875 0.016065 0.015810 0.085704 0.072267 0.072993 -0.012329 0.502198 0.497818 0.518244 0.653784 0.627240 0.631156 0.016316 0.155001 0.175061 0.159249 0.378022 0.374157 0.401050 0.581537 0.614574 0.586601 0.980457 1.000000 0.977574 0.590724 0.623561 0.595389 0.076702 0.061508 0.060785 0.036312 0.038011 0.036758 0.559452 0.008033 -0.000065 0.011055 -0.002112 0.005709 -0.008534 0.000241 0.002181 -0.004933 0.001942 -0.001354 0.023618 0.013022 0.006746 -0.001079 -0.011140 0.016610 0.009157 0.002139 -0.005575 0.015755 -0.008220 0.003498 -0.059319 -0.012944 -0.004438 -0.012132 -0.018407
ENTRANCES_AVG -0.002377 0.325433 0.327092 0.332897 0.061623 0.062890 0.064713 0.568360 0.565461 0.575143 0.037452 0.034497 0.033887 0.092882 0.087948 0.088597 -0.017163 0.512221 0.508848 0.511956 0.652995 0.651806 0.652724 0.020374 0.164597 0.169416 0.165812 0.403711 0.403905 0.402711 0.609724 0.610929 0.610304 0.996902 0.977574 1.000000 0.619383 0.623247 0.620071 0.091075 0.087422 0.086365 0.038050 0.042632 0.041513 0.594085 0.009025 0.000257 0.013035 -0.002917 0.006717 -0.010426 -0.000196 -0.000307 -0.004412 -0.000566 -0.000895 0.032358 0.018333 0.014063 -0.002855 -0.012022 0.021492 0.014622 0.005134 -0.006867 0.036256 -0.008986 0.002734 -0.062525 -0.013075 -0.023626 -0.030790 -0.020484
LIVINGAREA_AVG 0.003940 0.544066 0.545263 0.534595 0.136229 0.136504 0.131247 0.884652 0.881894 0.874249 0.458830 0.467477 0.465766 0.352223 0.353487 0.354490 -0.059801 0.503788 0.501159 0.491731 0.673436 0.693521 0.692532 0.065437 0.300217 0.283470 0.298327 0.865624 0.867534 0.851998 0.914218 0.890888 0.913205 0.615481 0.590724 0.619383 1.000000 0.971389 0.995427 0.625755 0.630360 0.628319 0.078552 0.095967 0.092702 0.926029 0.003755 -0.004576 0.034536 0.001753 0.005269 -0.018721 -0.002850 -0.026816 -0.017731 -0.027248 -0.015859 0.096877 0.078335 0.091897 -0.003996 -0.011096 0.084724 0.073658 0.035924 -0.009387 0.214648 0.001366 -0.012905 0.007223 -0.010633 -0.164884 -0.183204 -0.035242
LIVINGAREA_MODE 0.004250 0.519428 0.522608 0.533735 0.127699 0.129633 0.132183 0.858999 0.852971 0.879962 0.444623 0.440933 0.440049 0.344393 0.337070 0.337875 -0.054950 0.505662 0.501364 0.513547 0.690212 0.678494 0.680624 0.060149 0.285037 0.293653 0.288660 0.840316 0.838036 0.855616 0.893655 0.911286 0.896485 0.622494 0.623561 0.623247 0.971389 1.000000 0.974366 0.605886 0.596739 0.595832 0.077018 0.092394 0.088537 0.899386 0.003765 -0.003888 0.031676 0.002418 0.005215 -0.017109 -0.001990 -0.021698 -0.016989 -0.022150 -0.014828 0.085227 0.070392 0.081540 -0.002208 -0.009979 0.075350 0.065696 0.031260 -0.008615 0.182047 0.001559 -0.011724 0.008007 -0.010999 -0.133665 -0.153244 -0.032972
LIVINGAREA_MEDI 0.004374 0.542972 0.545882 0.536113 0.135458 0.136379 0.131377 0.886539 0.879724 0.875901 0.457790 0.465169 0.464294 0.352294 0.352896 0.353607 -0.058606 0.504540 0.501161 0.493367 0.672895 0.690215 0.691541 0.064241 0.296622 0.282900 0.296795 0.868147 0.865341 0.855201 0.913113 0.894377 0.916740 0.619575 0.595389 0.620071 0.995427 0.974366 1.000000 0.623908 0.626875 0.626008 0.078318 0.095415 0.092450 0.920828 0.003507 -0.004357 0.034514 0.001925 0.005444 -0.018801 -0.002907 -0.026049 -0.017193 -0.026479 -0.015416 0.095325 0.077267 0.090548 -0.003830 -0.011248 0.083559 0.072571 0.035275 -0.009594 0.210470 0.001903 -0.013176 0.007687 -0.010498 -0.161042 -0.179341 -0.034857
FLOORSMAX_MODE 0.005201 0.395279 0.394667 0.378377 0.108526 0.107229 0.101441 0.584335 0.584088 0.573404 0.727696 0.730044 0.730901 0.510358 0.511288 0.511258 -0.080548 0.220507 0.219714 0.212257 0.308825 0.328630 0.325050 0.086689 0.248043 0.231310 0.243800 0.669392 0.671129 0.661175 0.613988 0.595375 0.612051 0.086672 0.076702 0.091075 0.625755 0.605886 0.623908 1.000000 0.985669 0.988201 0.109787 0.130294 0.128123 0.626085 0.003030 -0.003126 0.041317 0.001317 0.001660 -0.018166 0.000556 -0.039061 -0.030220 -0.039367 -0.030381 0.129425 0.105551 0.128378 -0.001571 -0.006165 0.113995 0.100857 0.052066 -0.009677 0.303690 0.001685 -0.014106 0.049158 -0.011584 -0.219861 -0.237230 -0.045368
FLOORSMAX_AVG 0.005760 0.401736 0.400655 0.376467 0.113893 0.111649 0.102779 0.590479 0.591459 0.569560 0.723655 0.743030 0.740699 0.508150 0.517014 0.518305 -0.082869 0.217760 0.216961 0.202091 0.298168 0.329492 0.325273 0.089897 0.253370 0.225307 0.247064 0.676676 0.680446 0.656435 0.618444 0.585385 0.614861 0.083234 0.061508 0.087422 0.630360 0.596739 0.626875 0.985669 1.000000 0.997059 0.107363 0.131014 0.129041 0.633646 0.002101 -0.003560 0.043776 0.001105 0.002235 -0.018978 -0.000114 -0.040739 -0.030484 -0.041030 -0.030619 0.135144 0.108699 0.132397 -0.002280 -0.006622 0.119406 0.103899 0.054379 -0.009643 0.322096 0.002227 -0.014993 0.049425 -0.011297 -0.235021 -0.251429 -0.046041
FLOORSMAX_MEDI 0.005355 0.400223 0.399626 0.375441 0.112877 0.111432 0.102801 0.588101 0.588124 0.567508 0.724492 0.740669 0.741322 0.508380 0.517300 0.517313 -0.082545 0.217653 0.216819 0.202247 0.297787 0.327642 0.323735 0.088767 0.252478 0.225592 0.247169 0.676016 0.678042 0.655700 0.616228 0.584368 0.613871 0.081517 0.060785 0.086365 0.628319 0.595832 0.626008 0.988201 0.997059 1.000000 0.107193 0.130742 0.129143 0.630983 0.002460 -0.003372 0.043082 0.001254 0.002082 -0.019034 0.000021 -0.040378 -0.030446 -0.040664 -0.030693 0.133912 0.108049 0.131363 -0.001868 -0.006550 0.118014 0.103290 0.053956 -0.009383 0.317838 0.002280 -0.015050 0.049661 -0.011444 -0.231637 -0.248063 -0.045861
YEARS_BEGINEXPLUATATION_MODE 0.002445 0.050956 0.051044 0.049195 0.020760 0.020289 0.019254 0.088925 0.088665 0.087476 0.100572 0.101034 0.100881 0.302129 0.299885 0.299906 0.001837 0.054186 0.053952 0.052933 0.059862 0.061477 0.060960 -0.002320 -0.008654 -0.004004 -0.009434 0.073841 0.073437 0.077213 0.096675 0.101931 0.096835 0.037591 0.036312 0.038050 0.078552 0.077018 0.078318 0.109787 0.107363 0.107193 1.000000 0.972994 0.966071 0.099119 -0.002873 0.002492 -0.000614 0.003927 -0.000412 -0.007690 0.001574 -0.000131 -0.003751 -0.000038 -0.003759 0.007867 0.006882 0.015115 0.007266 0.001918 -0.011315 0.005819 0.005204 0.006001 -0.006707 0.001740 0.008376 0.010382 -0.001100 0.004547 -0.000838 -0.009553
YEARS_BEGINEXPLUATATION_AVG 0.002513 0.095025 0.095260 0.090068 0.035872 0.034919 0.032312 0.153387 0.152964 0.148304 0.168074 0.172300 0.171914 0.492266 0.497321 0.497986 -0.000012 0.076599 0.076331 0.072452 0.083918 0.089229 0.088566 -0.001223 0.012008 0.010092 0.010814 0.079690 0.079682 0.079978 0.101424 0.101663 0.101344 0.041857 0.038011 0.042632 0.095967 0.092394 0.095415 0.130294 0.131014 0.130742 0.972994 1.000000 0.994221 0.101522 -0.003130 0.003277 -0.001142 0.003716 0.000386 -0.008031 0.001664 -0.000455 -0.005138 -0.000371 -0.005337 0.008709 0.008124 0.015545 0.007922 0.003131 -0.010619 0.007028 0.005564 0.006926 -0.006570 0.002015 0.008846 0.012817 -0.002182 0.004508 -0.000733 -0.010557
YEARS_BEGINEXPLUATATION_MEDI 0.002298 0.078857 0.079089 0.074009 0.032569 0.031826 0.029473 0.131092 0.130738 0.126255 0.148876 0.152133 0.152238 0.438762 0.443892 0.443345 0.000043 0.071351 0.071097 0.067231 0.076458 0.081782 0.081095 -0.000907 0.013086 0.008034 0.012117 0.078694 0.078568 0.079048 0.100973 0.100944 0.101165 0.040780 0.036758 0.041513 0.092702 0.088537 0.092450 0.128123 0.129041 0.129143 0.966071 0.994221 1.000000 0.100343 -0.002702 0.002431 -0.000934 0.003707 0.000400 -0.007979 0.001780 -0.000366 -0.005232 -0.000286 -0.005390 0.008639 0.007677 0.015242 0.007665 0.003533 -0.010170 0.006500 0.005571 0.006571 -0.006542 0.002051 0.008620 0.012831 -0.001754 0.004512 -0.000644 -0.010934
TOTALAREA_MODE 0.003307 0.550656 0.550483 0.541181 0.144837 0.144587 0.139331 0.847531 0.849248 0.834733 0.446324 0.456486 0.454403 0.355397 0.357755 0.359051 -0.061077 0.493214 0.491015 0.479343 0.648240 0.673316 0.669533 0.063785 0.365713 0.345283 0.360934 0.838507 0.845008 0.820160 0.892090 0.862033 0.886104 0.587397 0.559452 0.594085 0.926029 0.899386 0.920828 0.626085 0.633646 0.630983 0.099119 0.101522 0.100343 1.000000 0.004567 -0.003647 0.033840 0.002358 0.005557 -0.018790 -0.003923 -0.027016 -0.018859 -0.027462 -0.017370 0.094737 0.078645 0.092692 -0.001398 -0.008522 0.080780 0.074399 0.037922 -0.006763 0.203455 0.002688 -0.014987 0.019829 -0.010000 -0.161519 -0.178946 -0.035540
EXT_SOURCE_3 -0.000007 -0.005499 -0.005625 -0.004424 0.009442 0.008861 0.008848 0.000900 0.001055 0.001906 0.003778 0.002409 0.002280 0.014674 0.015024 0.015181 -0.013837 0.009260 0.009236 0.008100 0.004110 0.005423 0.005378 0.185211 -0.002831 -0.003298 -0.003549 0.006905 0.006803 0.006742 0.003258 0.002695 0.002998 0.008871 0.008033 0.009025 0.003755 0.003765 0.003507 0.003030 0.002101 0.002460 -0.002873 -0.003130 -0.002702 0.004567 1.000000 -0.020485 -0.008664 -0.001117 -0.008654 -0.072853 -0.023523 -0.000080 -0.034924 0.000248 -0.038208 0.109183 0.047128 0.029045 -0.029311 -0.075542 -0.040533 0.043049 -0.029240 -0.043570 -0.006362 -0.206463 0.114225 -0.106684 -0.131930 -0.012732 -0.012105 -0.180865
AMT_REQ_CREDIT_BUREAU_WEEK 0.001299 -0.009497 -0.009552 -0.008405 -0.003654 -0.003997 -0.004205 -0.007432 -0.007485 -0.007015 -0.001291 -0.001575 -0.000978 -0.006569 -0.006244 -0.006283 0.003276 0.005231 0.007634 0.005646 -0.002767 -0.002262 -0.002666 -0.002503 -0.007740 -0.007128 -0.007969 -0.003133 -0.003030 -0.002594 -0.003478 -0.003112 -0.003533 -0.000142 -0.000065 0.000257 -0.004576 -0.003888 -0.004357 -0.003126 -0.003560 -0.003372 0.002492 0.003277 0.002431 -0.003647 -0.020485 1.000000 -0.014782 0.004792 0.221089 0.016939 -0.014195 -0.001789 -0.003369 -0.001919 -0.003194 0.001740 -0.001594 0.013018 -0.002436 -0.002318 -0.004517 -0.001802 0.001770 -0.003201 -0.003104 -0.000823 0.002864 -0.001097 -0.002042 0.003056 0.002039 -0.001428
AMT_REQ_CREDIT_BUREAU_MON 0.000227 0.022451 0.022149 0.019809 -0.000560 -0.000965 -0.001375 0.032529 0.032595 0.030218 0.035653 0.039477 0.038721 -0.004297 -0.004164 -0.004172 -0.022521 0.011826 0.012075 0.010784 0.019158 0.020907 0.021017 0.031976 0.012384 0.009529 0.011464 0.040722 0.040755 0.038436 0.034102 0.031627 0.033987 0.013349 0.011055 0.013035 0.034536 0.031676 0.034514 0.041317 0.043776 0.043082 -0.000614 -0.001142 -0.000934 0.033840 -0.008664 -0.014782 1.000000 -0.000423 -0.006517 -0.005589 -0.008322 0.000739 -0.003774 0.000688 -0.000706 0.052036 0.056476 0.038745 -0.007124 -0.041114 0.036501 0.054457 0.022868 -0.009941 0.078099 0.003435 -0.035039 -0.010973 -0.008832 -0.069076 -0.067108 -0.012376
AMT_REQ_CREDIT_BUREAU_HOUR -0.002844 0.006416 0.006569 0.006513 0.000469 0.000675 -0.000420 0.002651 0.002833 0.003853 0.003737 0.003833 0.003881 0.001198 0.001142 0.001230 0.003907 -0.001021 -0.001104 -0.000234 -0.000325 -0.001259 -0.001086 -0.006640 0.002492 0.002035 0.001938 0.000570 0.000927 0.000721 0.001789 0.002413 0.001865 -0.002721 -0.002112 -0.002917 0.001753 0.002418 0.001925 0.001317 0.001105 0.001254 0.003927 0.003716 0.003707 0.002358 -0.001117 0.004792 -0.000423 1.000000 0.219818 -0.004533 -0.003131 -0.000042 -0.004294 0.000002 -0.002580 -0.003003 -0.003191 0.003610 0.000645 -0.000615 -0.017674 -0.003724 0.000290 -0.000417 -0.003025 0.003899 -0.003969 -0.001868 0.004427 0.006634 0.006760 -0.000547
AMT_REQ_CREDIT_BUREAU_DAY -0.001018 -0.000265 -0.000085 0.000204 -0.001643 -0.001680 -0.001305 0.003484 0.003390 0.003741 0.003338 0.003686 0.003681 0.001962 0.003460 0.003057 -0.006480 0.005569 0.005682 0.005862 0.004118 0.004760 0.005041 -0.004104 0.001485 0.000482 0.000551 0.002988 0.003282 0.003161 0.004611 0.004402 0.004587 0.006788 0.005709 0.006717 0.005269 0.005215 0.005444 0.001660 0.002235 0.002082 -0.000412 0.000386 0.000400 0.005557 -0.008654 0.221089 -0.006517 0.219818 1.000000 -0.003451 -0.004329 -0.002258 -0.002209 -0.002236 -0.001373 -0.000246 0.004451 0.001429 -0.000485 0.002352 0.000075 0.004057 0.002500 0.000581 0.001361 0.002007 0.001232 -0.000931 -0.002177 -0.001510 -0.001322 0.000813
AMT_REQ_CREDIT_BUREAU_YEAR 0.004930 -0.014661 -0.014401 -0.013372 0.001379 0.001970 0.002258 -0.013095 -0.012730 -0.012366 -0.008855 -0.010269 -0.010540 -0.020694 -0.021299 -0.021440 -0.015641 -0.011681 -0.012393 -0.010501 -0.011166 -0.012728 -0.012201 0.005301 -0.009466 -0.007350 -0.008823 -0.016773 -0.017063 -0.015946 -0.015733 -0.013849 -0.015310 -0.010360 -0.008534 -0.010426 -0.018721 -0.017109 -0.018801 -0.018166 -0.018978 -0.019034 -0.007690 -0.008031 -0.007979 -0.018790 -0.072853 0.016939 -0.005589 -0.004533 -0.003451 1.000000 0.073030 0.034751 0.016694 0.034265 0.019272 -0.022484 -0.051730 -0.011349 -0.028808 -0.113448 -0.030689 -0.049236 0.010620 -0.041786 0.002898 -0.072728 0.049800 -0.025366 -0.034662 0.010981 0.010322 0.018896
AMT_REQ_CREDIT_BUREAU_QRT -0.000050 -0.010515 -0.010050 -0.009280 0.002805 0.003295 0.003143 -0.008347 -0.008789 -0.008217 -0.004238 -0.004978 -0.004967 -0.006423 -0.007438 -0.007304 -0.017527 0.006480 0.006054 0.006728 -0.002863 -0.003567 -0.003754 -0.002403 -0.002690 -0.001136 -0.002270 -0.004685 -0.005053 -0.004350 -0.002850 -0.002504 -0.002789 -0.000025 0.000241 -0.000196 -0.002850 -0.001990 -0.002907 0.000556 -0.000114 0.000021 0.001574 0.001664 0.001780 -0.003923 -0.023523 -0.014195 -0.008322 -0.003131 -0.004329 0.073030 1.000000 0.004368 -0.000078 0.004627 -0.000950 -0.003633 0.015635 0.009594 -0.005218 -0.002055 -0.000416 0.015057 0.004531 -0.008286 -0.000677 -0.011702 0.014332 -0.000095 -0.007338 0.005321 0.004850 -0.002230
OBS_60_CNT_SOCIAL_CIRCLE -0.001489 -0.020677 -0.020014 -0.016636 -0.001056 -0.000561 -0.000231 -0.028310 -0.028427 -0.025142 -0.035979 -0.038168 -0.037967 0.001401 0.000646 0.000507 0.005161 -0.003551 -0.003694 -0.002552 -0.010674 -0.015154 -0.014443 -0.026333 -0.017470 -0.013123 -0.015676 -0.034902 -0.035805 -0.031987 -0.024016 -0.020334 -0.023605 0.000122 0.002181 -0.000307 -0.026816 -0.021698 -0.026049 -0.039061 -0.040739 -0.040378 -0.000131 -0.000455 -0.000366 -0.027016 -0.000080 -0.001789 0.000739 -0.000042 -0.002258 0.034751 0.004368 1.000000 0.234584 0.998362 0.308842 -0.019123 0.001816 -0.010986 0.025977 -0.015177 -0.010677 0.001722 -0.012351 0.015323 -0.010509 0.006292 0.006044 0.009425 -0.012644 0.034230 0.029777 0.009144
DEF_60_CNT_SOCIAL_CIRCLE 0.000678 -0.014209 -0.013928 -0.013215 -0.001319 -0.000911 -0.000246 -0.016995 -0.017006 -0.017523 -0.022872 -0.023657 -0.023556 -0.011099 -0.011636 -0.011478 0.011677 -0.001748 -0.001492 -0.002895 -0.011675 -0.013251 -0.012850 -0.030973 -0.013272 -0.012010 -0.012892 -0.023556 -0.024005 -0.023458 -0.016403 -0.016230 -0.016286 -0.004201 -0.004933 -0.004412 -0.017731 -0.016989 -0.017193 -0.030220 -0.030484 -0.030446 -0.003751 -0.005138 -0.005232 -0.018859 -0.034924 -0.003369 -0.003774 -0.004294 -0.002209 0.016694 -0.000078 0.234584 1.000000 0.232368 0.859132 -0.033888 -0.023002 -0.023382 -0.005347 0.002201 -0.009769 -0.022172 -0.012178 -0.003045 0.001552 0.001259 0.014949 0.004320 0.004500 0.017643 0.016739 0.029870
OBS_30_CNT_SOCIAL_CIRCLE -0.001404 -0.021039 -0.020368 -0.016998 -0.001377 -0.000880 -0.000553 -0.028816 -0.028928 -0.025629 -0.036522 -0.038671 -0.038457 0.001537 0.000839 0.000709 0.005222 -0.003813 -0.003964 -0.002832 -0.011010 -0.015466 -0.014743 -0.026887 -0.017583 -0.013260 -0.015779 -0.035381 -0.036295 -0.032493 -0.024522 -0.020814 -0.024106 -0.000143 0.001942 -0.000566 -0.027248 -0.022150 -0.026479 -0.039367 -0.041030 -0.040664 -0.000038 -0.000371 -0.000286 -0.027462 0.000248 -0.001919 0.000688 0.000002 -0.002236 0.034265 0.004627 0.998362 0.232368 1.000000 0.306435 -0.019501 0.001799 -0.011256 0.026342 -0.014661 -0.010689 0.001677 -0.012438 0.015670 -0.010980 0.006664 0.005798 0.009426 -0.012238 0.034598 0.030115 0.009272
DEF_30_CNT_SOCIAL_CIRCLE -0.000575 -0.012428 -0.012346 -0.011801 0.001349 0.001888 0.003069 -0.015635 -0.015667 -0.016124 -0.025390 -0.026169 -0.026158 -0.010162 -0.010555 -0.010424 0.007421 -0.002895 -0.002509 -0.003729 -0.009459 -0.010879 -0.010474 -0.028715 -0.013243 -0.011674 -0.012713 -0.022742 -0.023109 -0.022422 -0.013851 -0.013413 -0.013698 -0.000596 -0.001354 -0.000895 -0.015859 -0.014828 -0.015416 -0.030381 -0.030619 -0.030693 -0.003759 -0.005337 -0.005390 -0.017370 -0.038208 -0.003194 -0.000706 -0.002580 -0.001373 0.019272 -0.000950 0.308842 0.859132 0.306435 1.000000 -0.032222 -0.020983 -0.022416 -0.002822 0.000701 -0.006368 -0.019980 -0.012462 -0.001948 0.006005 -0.000538 0.017882 0.002464 0.002850 0.015480 0.014089 0.031837
EXT_SOURCE_2 0.001123 0.053179 0.051516 0.043665 0.019233 0.018113 0.016875 0.078604 0.080303 0.071318 0.106986 0.112450 0.111551 0.007695 0.010393 0.010791 -0.081239 0.021615 0.022506 0.017290 0.037158 0.047843 0.046458 0.213917 0.045519 0.037709 0.043267 0.113715 0.115388 0.106503 0.090343 0.079769 0.088616 0.031061 0.023618 0.032358 0.096877 0.085227 0.095325 0.129425 0.135144 0.133912 0.007867 0.008709 0.008639 0.094737 0.109183 0.001740 0.052036 -0.003003 -0.000246 -0.022484 -0.003633 -0.019123 -0.033888 -0.019501 -0.032222 1.000000 0.139108 0.125559 -0.001857 -0.195827 0.156600 0.131146 0.054966 -0.017545 0.198794 -0.091607 -0.019670 -0.058838 -0.050631 -0.291729 -0.287190 -0.159698
AMT_GOODS_PRICE 0.000227 0.049932 0.048917 0.041974 0.014541 0.013412 0.010851 0.061198 0.062989 0.054533 0.076515 0.080338 0.079628 0.038318 0.039981 0.040326 -0.106258 0.011375 0.011802 0.007835 0.037724 0.045509 0.043617 0.174615 0.044956 0.039309 0.042839 0.083950 0.085197 0.079855 0.067394 0.059877 0.065768 0.017277 0.013022 0.018333 0.078335 0.070392 0.077267 0.105551 0.108699 0.108049 0.006882 0.008124 0.007677 0.078645 0.047128 -0.001594 0.056476 -0.003191 0.004451 -0.051730 0.015635 0.001816 -0.023002 0.001799 -0.020983 0.139108 1.000000 0.774414 0.060464 -0.076893 0.062811 0.986998 0.146114 -0.002337 0.105018 -0.053663 -0.064092 0.012095 -0.008840 -0.104647 -0.113207 -0.039304
AMT_ANNUITY -0.000003 0.056695 0.055852 0.047572 0.022276 0.021405 0.017211 0.074110 0.076515 0.065992 0.094729 0.100250 0.098972 0.030641 0.032850 0.033351 -0.099371 0.005896 0.006374 0.001457 0.036378 0.046552 0.044472 0.119410 0.054684 0.045902 0.051977 0.102732 0.104433 0.096243 0.079158 0.068941 0.077002 0.012701 0.006746 0.014063 0.091897 0.081540 0.090548 0.128378 0.132397 0.131363 0.015115 0.015545 0.015242 0.092692 0.029045 0.013018 0.038745 0.003610 0.001429 -0.011349 0.009594 -0.010986 -0.023382 -0.011256 -0.022416 0.125559 0.774414 1.000000 0.075081 -0.064906 0.053074 0.769449 0.175849 0.020850 0.119916 0.008731 -0.103850 0.038813 0.011894 -0.129451 -0.143008 -0.012715
CNT_FAM_MEMBERS -0.002231 0.000262 0.000731 0.000838 0.002755 0.003062 0.002576 -0.004163 -0.004810 -0.004381 -0.001186 -0.002877 -0.002178 0.041360 0.041839 0.041869 -0.015176 0.000430 0.000102 0.001572 -0.004981 -0.005527 -0.005794 -0.096102 0.004982 0.005223 0.005025 0.000133 -0.000277 0.001385 -0.010062 -0.008183 -0.009991 -0.003046 -0.001079 -0.002855 -0.003996 -0.002208 -0.003830 -0.001571 -0.002280 -0.001868 0.007266 0.007922 0.007665 -0.001398 -0.029311 -0.002436 -0.007124 0.000645 -0.000485 -0.028808 -0.005218 0.025977 -0.005347 0.026342 -0.002822 -0.001857 0.060464 0.075081 1.000000 -0.027481 -0.012143 0.062528 0.015713 0.878837 -0.024273 0.278429 -0.233456 0.174431 -0.020803 0.030923 0.031620 0.010330
DAYS_LAST_PHONE_CHANGE 0.000776 -0.002659 -0.002478 -0.000391 0.001123 0.001182 0.000882 -0.002901 -0.003382 -0.003171 -0.006971 -0.007270 -0.007243 0.011749 0.011615 0.011920 0.002689 -0.000237 0.000591 -0.000183 -0.005732 -0.006458 -0.007090 -0.130211 -0.004054 -0.004427 -0.004454 -0.011752 -0.011418 -0.010413 -0.008792 -0.007959 -0.009078 -0.012220 -0.011140 -0.012022 -0.011096 -0.009979 -0.011248 -0.006165 -0.006622 -0.006550 0.001918 0.003131 0.003533 -0.008522 -0.075542 -0.002318 -0.041114 -0.000615 0.002352 -0.113448 -0.002055 -0.015177 0.002201 -0.014661 0.000701 -0.195827 -0.076893 -0.064906 -0.027481 1.000000 -0.015647 -0.074388 -0.017254 -0.006180 -0.046043 0.083957 0.023129 0.056938 0.086779 0.026558 0.025939 0.054953
HOUR_APPR_PROCESS_START 0.000205 0.047662 0.046151 0.040003 0.014680 0.014174 0.012107 0.078353 0.079959 0.072238 0.113720 0.119442 0.118550 -0.016409 -0.014470 -0.014282 -0.069504 0.014274 0.014503 0.011613 0.034527 0.041399 0.040947 0.032487 0.044565 0.038635 0.043368 0.105367 0.106407 0.099335 0.083651 0.074785 0.082497 0.021172 0.016610 0.021492 0.084724 0.075350 0.083559 0.113995 0.119406 0.118014 -0.011315 -0.010619 -0.010170 0.080780 -0.040533 -0.004517 0.036501 -0.017674 0.000075 -0.030689 -0.000416 -0.010677 -0.009769 -0.010689 -0.006368 0.156600 0.062811 0.053074 -0.012143 -0.015647 1.000000 0.053257 0.033784 -0.006909 0.171821 0.092099 -0.090384 -0.011111 0.032615 -0.285609 -0.265247 -0.022945
AMT_CREDIT 0.000214 0.049198 0.048203 0.041446 0.013413 0.012401 0.010076 0.058731 0.060508 0.052481 0.074611 0.078129 0.077513 0.033075 0.034655 0.034931 -0.096874 0.004690 0.005175 0.001402 0.033595 0.041226 0.039395 0.167599 0.040894 0.035297 0.038741 0.081052 0.082385 0.076927 0.063280 0.055799 0.061599 0.013505 0.009157 0.014622 0.073658 0.065696 0.072571 0.100857 0.103899 0.103290 0.005819 0.007028 0.006500 0.074399 0.043049 -0.001802 0.054457 -0.003724 0.004057 -0.049236 0.015057 0.001722 -0.022172 0.001677 -0.019980 0.131146 0.986998 0.769449 0.062528 -0.074388 0.053257 1.000000 0.143687 0.001776 0.101220 -0.055576 -0.066224 0.010353 -0.006176 -0.102672 -0.111988 -0.030187
AMT_INCOME_TOTAL -0.001795 0.086203 0.084201 0.072656 0.030406 0.028913 0.025624 0.105237 0.107432 0.092782 0.130492 0.139013 0.137605 0.038279 0.042482 0.042782 -0.119654 -0.002390 -0.002143 -0.004020 0.011618 0.015454 0.014711 0.023251 0.077089 0.064064 0.073302 0.039690 0.040491 0.036894 0.031310 0.027137 0.030678 0.004576 0.002139 0.005134 0.035924 0.031260 0.035275 0.052066 0.054379 0.053956 0.005204 0.005564 0.005571 0.037922 -0.029240 0.001770 0.022868 0.000290 0.002500 0.010620 0.004531 -0.012351 -0.012178 -0.012438 -0.012462 0.054966 0.146114 0.175849 0.015713 -0.017254 0.033784 0.143687 1.000000 0.012452 0.068597 0.025544 -0.058891 0.025475 0.008070 -0.078886 -0.084670 -0.002481
CNT_CHILDREN -0.000688 -0.000503 -0.000145 -0.000906 0.004179 0.004442 0.004294 -0.005822 -0.006488 -0.006230 -0.009376 -0.010143 -0.009670 0.029196 0.029595 0.029646 0.009539 -0.004147 -0.004457 -0.003953 -0.009291 -0.009050 -0.009238 -0.138459 0.003166 0.002664 0.003049 -0.005835 -0.006032 -0.005723 -0.012330 -0.011344 -0.012264 -0.006975 -0.005575 -0.006867 -0.009387 -0.008615 -0.009594 -0.009677 -0.009643 -0.009383 0.006001 0.006926 0.006571 -0.006763 -0.043570 -0.003201 -0.009941 -0.000417 0.000581 -0.041786 -0.008286 0.015323 -0.003045 0.015670 -0.001948 -0.017545 -0.002337 0.020850 0.878837 -0.006180 -0.006909 0.001776 0.012452 1.000000 -0.025826 0.331623 -0.240468 0.183940 -0.028503 0.025528 0.024614 0.019552
REGION_POPULATION_RELATIVE 0.001271 0.168101 0.163327 0.134159 0.024268 0.021699 0.016331 0.190426 0.195956 0.164517 0.273877 0.292362 0.288614 -0.064028 -0.058163 -0.057069 -0.082891 -0.053101 -0.051987 -0.061096 0.066314 0.098987 0.094199 0.098941 0.076143 0.051838 0.067917 0.275174 0.281380 0.252585 0.206390 0.175433 0.201838 0.033167 0.015755 0.036256 0.214648 0.182047 0.210470 0.303690 0.322096 0.317838 -0.006707 -0.006570 -0.006542 0.203455 -0.006362 -0.003104 0.078099 -0.003025 0.001361 0.002898 -0.000677 -0.010509 0.001552 -0.010980 0.006005 0.198794 0.105018 0.119916 -0.024273 -0.046043 0.171821 0.101220 0.068597 -0.025826 1.000000 -0.029078 -0.003825 -0.052062 -0.003950 -0.532986 -0.531728 -0.037004
DAYS_BIRTH -0.000841 0.006585 0.007296 0.007584 0.000849 0.000777 0.001163 0.013687 0.013299 0.013336 0.000420 0.001133 0.001302 0.025823 0.027171 0.026899 0.007699 0.004539 0.004210 0.004763 -0.002691 -0.002384 -0.002358 -0.598890 0.004914 0.004090 0.005578 -0.000223 -0.000371 -0.000107 0.006776 0.006669 0.006985 -0.008534 -0.008220 -0.008986 0.001366 0.001559 0.001903 0.001685 0.002227 0.002280 0.001740 0.002015 0.002051 0.002688 -0.206463 -0.000823 0.003435 0.003899 0.002007 -0.072728 -0.011702 0.006292 0.001259 0.006664 -0.000538 -0.091607 -0.053663 0.008731 0.278429 0.083957 0.092099 -0.055576 0.025544 0.331623 -0.029078 1.000000 -0.615504 0.331472 0.272287 0.008738 0.007549 0.078418
DAYS_EMPLOYED 0.001274 -0.008967 -0.009276 -0.009378 -0.002721 -0.002782 -0.003421 -0.020043 -0.020296 -0.019826 -0.013644 -0.014006 -0.014512 -0.006851 -0.007974 -0.007603 0.028075 -0.011408 -0.011420 -0.010425 -0.000176 -0.001224 -0.001120 0.289068 -0.014019 -0.012948 -0.014081 -0.008678 -0.008651 -0.008199 -0.017006 -0.015507 -0.016718 0.002773 0.003498 0.002734 -0.012905 -0.011724 -0.013176 -0.014106 -0.014993 -0.015050 0.008376 0.008846 0.008620 -0.014987 0.114225 0.002864 -0.035039 -0.003969 0.001232 0.049800 0.014332 0.006044 0.014949 0.005798 0.017882 -0.019670 -0.064092 -0.103850 -0.233456 0.023129 -0.090384 -0.066224 -0.058891 -0.240468 -0.003825 -0.615504 1.000000 -0.210273 -0.272791 0.032585 0.034407 -0.045064
DAYS_REGISTRATION -0.000630 0.024592 0.025303 0.025497 0.035364 0.034240 0.032723 0.025284 0.024839 0.023973 0.019499 0.020757 0.020821 0.163429 0.164861 0.165196 -0.025165 0.003442 0.003438 0.004006 -0.018812 -0.020079 -0.020656 -0.178719 0.052079 0.049933 0.052898 0.000790 -0.000080 0.001957 0.013472 0.013142 0.013577 -0.062268 -0.059319 -0.062525 0.007223 0.008007 0.007687 0.049158 0.049425 0.049661 0.010382 0.012817 0.012831 0.019829 -0.106684 -0.001097 -0.010973 -0.001868 -0.000931 -0.025366 -0.000095 0.009425 0.004320 0.009426 0.002464 -0.058838 0.012095 0.038813 0.174431 0.056938 -0.011111 0.010353 0.025475 0.183940 -0.052062 0.331472 -0.210273 1.000000 0.101934 0.079297 0.072988 0.040217
DAYS_ID_PUBLISH -0.000887 -0.000485 -0.000236 -0.000491 -0.008094 -0.007466 -0.007737 0.000204 0.000710 0.000049 -0.009859 -0.009386 -0.009253 -0.009393 -0.009253 -0.009454 0.008747 -0.005515 -0.005355 -0.005961 -0.011839 -0.012849 -0.013046 -0.132527 0.001327 0.000026 0.001627 -0.010731 -0.010767 -0.010496 -0.006499 -0.006220 -0.006452 -0.013221 -0.012944 -0.013075 -0.010633 -0.010999 -0.010498 -0.011584 -0.011297 -0.011444 -0.001100 -0.002182 -0.001754 -0.010000 -0.131930 -0.002042 -0.008832 0.004427 -0.002177 -0.034662 -0.007338 -0.012644 0.004500 -0.012238 0.002850 -0.050631 -0.008840 0.011894 -0.020803 0.086779 0.032615 -0.006176 0.008070 -0.028503 -0.003950 0.272287 -0.272791 0.101934 1.000000 -0.005385 -0.008018 0.051695
REGION_RATING_CLIENT -0.001853 -0.120701 -0.117366 -0.095498 -0.018347 -0.015891 -0.010272 -0.152176 -0.156766 -0.129571 -0.215123 -0.229994 -0.227258 0.048298 0.043189 0.042167 0.086297 0.046965 0.045123 0.058796 -0.032146 -0.061396 -0.057318 -0.113677 -0.082002 -0.059503 -0.075075 -0.221633 -0.227037 -0.201538 -0.152610 -0.123397 -0.148410 -0.021531 -0.004438 -0.023626 -0.164884 -0.133665 -0.161042 -0.219861 -0.235021 -0.231637 0.004547 0.004508 0.004512 -0.161519 -0.012732 0.003056 -0.069076 0.006634 -0.001510 0.010981 0.005321 0.034230 0.017643 0.034598 0.015480 -0.291729 -0.104647 -0.129451 0.030923 0.026558 -0.285609 -0.102672 -0.078886 0.025528 -0.532986 0.008738 0.032585 0.079297 -0.005385 1.000000 0.950316 0.058141
REGION_RATING_CLIENT_W_CITY -0.001741 -0.130876 -0.127754 -0.107276 -0.021329 -0.019139 -0.014123 -0.176999 -0.181184 -0.155851 -0.222929 -0.236985 -0.234200 0.040781 0.036414 0.035435 0.087654 0.037945 0.036342 0.048524 -0.046738 -0.074168 -0.070167 -0.113373 -0.082463 -0.060918 -0.075988 -0.233193 -0.238425 -0.213864 -0.172048 -0.143996 -0.167831 -0.028446 -0.012132 -0.030790 -0.183204 -0.153244 -0.179341 -0.237230 -0.251429 -0.248063 -0.000838 -0.000733 -0.000644 -0.178946 -0.012105 0.002039 -0.067108 0.006760 -0.001322 0.010322 0.004850 0.029777 0.016739 0.030115 0.014089 -0.287190 -0.113207 -0.143008 0.031620 0.025939 -0.265247 -0.111988 -0.084670 0.024614 -0.531728 0.007549 0.034407 0.072988 -0.008018 0.950316 1.000000 0.059963
TARGET -0.000581 -0.021858 -0.021818 -0.019588 -0.003702 -0.002904 -0.001785 -0.025916 -0.026580 -0.024955 -0.033119 -0.033705 -0.033636 -0.025586 -0.025933 -0.025685 0.039531 -0.013984 -0.013539 -0.012519 -0.021323 -0.023834 -0.023122 -0.155781 -0.012034 -0.010751 -0.011442 -0.035791 -0.036381 -0.034306 -0.031644 -0.029427 -0.031137 -0.020116 -0.018407 -0.020484 -0.035242 -0.032972 -0.034857 -0.045368 -0.046041 -0.045861 -0.009553 -0.010557 -0.010934 -0.035540 -0.180865 -0.001428 -0.012376 -0.000547 0.000813 0.018896 -0.002230 0.009144 0.029870 0.009272 0.031837 -0.159698 -0.039304 -0.012715 0.010330 0.054953 -0.022945 -0.030187 -0.002481 0.019552 -0.037004 0.078418 -0.045064 0.040217 0.051695 0.058141 0.059963 1.000000
In [20]:
f_aux.get_corr_matrix(dataset = df_loan_train[list_var_continuous], 
                metodo='pearson', size_figure=[10,8])
No description has been provided for this image
Out[20]:
0

De las correlaciones observadas me gustaría destacar dos de ellas:

  1. Observamos como AMT_CREDIT y AMT_ANNUITY tienen una correlación positiva del 77%, es decir, si aumenta la cantidad de dinero prestado al cliente, aumenta la anualidad de la solicitud anterior.

  2. AMT_CREDIT Y AMT_GOOD_PRICES presentan una correlación lineal positiva del 99%, es decir, cuanto mayor es cantidad prestada al cliente, mayor es el valor de sus bienes para los que se le ha concedido el préstamo. Esto es algo lógico.

Además de estas dos correlaciones, la variable 'TARGET' no está altamente correlacionada y no hay variables que expliquen el comportamiento de nuestra variable objetivo.

In [21]:
corr.loc['TARGET'].sort_values(ascending=False)
Out[21]:
TARGET                          1.000000
DAYS_BIRTH                      0.078418
REGION_RATING_CLIENT_W_CITY     0.059963
REGION_RATING_CLIENT            0.058141
DAYS_LAST_PHONE_CHANGE          0.054953
DAYS_ID_PUBLISH                 0.051695
DAYS_REGISTRATION               0.040217
OWN_CAR_AGE                     0.039531
DEF_30_CNT_SOCIAL_CIRCLE        0.031837
DEF_60_CNT_SOCIAL_CIRCLE        0.029870
CNT_CHILDREN                    0.019552
AMT_REQ_CREDIT_BUREAU_YEAR      0.018896
CNT_FAM_MEMBERS                 0.010330
OBS_30_CNT_SOCIAL_CIRCLE        0.009272
OBS_60_CNT_SOCIAL_CIRCLE        0.009144
AMT_REQ_CREDIT_BUREAU_DAY       0.000813
AMT_REQ_CREDIT_BUREAU_HOUR     -0.000547
SK_ID_CURR                     -0.000581
AMT_REQ_CREDIT_BUREAU_WEEK     -0.001428
NONLIVINGAPARTMENTS_MODE       -0.001785
AMT_REQ_CREDIT_BUREAU_QRT      -0.002230
AMT_INCOME_TOTAL               -0.002481
NONLIVINGAPARTMENTS_MEDI       -0.002904
NONLIVINGAPARTMENTS_AVG        -0.003702
YEARS_BEGINEXPLUATATION_MODE   -0.009553
YEARS_BEGINEXPLUATATION_AVG    -0.010557
NONLIVINGAREA_MODE             -0.010751
YEARS_BEGINEXPLUATATION_MEDI   -0.010934
NONLIVINGAREA_MEDI             -0.011442
NONLIVINGAREA_AVG              -0.012034
AMT_REQ_CREDIT_BUREAU_MON      -0.012376
LANDAREA_MODE                  -0.012519
AMT_ANNUITY                    -0.012715
LANDAREA_AVG                   -0.013539
LANDAREA_MEDI                  -0.013984
ENTRANCES_MODE                 -0.018407
COMMONAREA_MODE                -0.019588
ENTRANCES_MEDI                 -0.020116
ENTRANCES_AVG                  -0.020484
BASEMENTAREA_MODE              -0.021323
COMMONAREA_MEDI                -0.021818
COMMONAREA_AVG                 -0.021858
HOUR_APPR_PROCESS_START        -0.022945
BASEMENTAREA_MEDI              -0.023122
BASEMENTAREA_AVG               -0.023834
LIVINGAPARTMENTS_MODE          -0.024955
YEARS_BUILD_MODE               -0.025586
YEARS_BUILD_AVG                -0.025685
LIVINGAPARTMENTS_MEDI          -0.025916
YEARS_BUILD_MEDI               -0.025933
LIVINGAPARTMENTS_AVG           -0.026580
APARTMENTS_MODE                -0.029427
AMT_CREDIT                     -0.030187
APARTMENTS_MEDI                -0.031137
APARTMENTS_AVG                 -0.031644
LIVINGAREA_MODE                -0.032972
FLOORSMIN_MODE                 -0.033119
FLOORSMIN_MEDI                 -0.033636
FLOORSMIN_AVG                  -0.033705
ELEVATORS_MODE                 -0.034306
LIVINGAREA_MEDI                -0.034857
LIVINGAREA_AVG                 -0.035242
TOTALAREA_MODE                 -0.035540
ELEVATORS_MEDI                 -0.035791
ELEVATORS_AVG                  -0.036381
REGION_POPULATION_RELATIVE     -0.037004
AMT_GOODS_PRICE                -0.039304
DAYS_EMPLOYED                  -0.045064
FLOORSMAX_MODE                 -0.045368
FLOORSMAX_MEDI                 -0.045861
FLOORSMAX_AVG                  -0.046041
EXT_SOURCE_1                   -0.155781
EXT_SOURCE_2                   -0.159698
EXT_SOURCE_3                   -0.180865
Name: TARGET, dtype: float64

Ninguna variable explica de una manera muy grande a la variable Target, algo que parece normal en un problema tan complejo como es la detección de dificultad en pago de préstamos.

Tratamiento de valores nulos¶

El tratamiento de valores nulos depende del contexto en el que estemos trabajando, la naturaleza de los datos y el impacto que los valores ausentes pueden tener en el análisis o modelo de machine learning. En general hay varias opciones a la hora de imputar nuestros valores nulos:

  1. Imputar los valores numéricos mediante la media si nuestras variables siguen una distribución normal o mediante la mediana cuando presenten valores atípicos. Imputar un valor fijo o predeterminado, o utilizar un algoritmo de imputación avanzada (KNN) que predice los valores ausentes en función de los valores de otras columnas.

  2. Imputar los valores categóricos mediante la moda cuando las variables presentan valores dominantes, asignar un valor fijo como pudiera ser 'Desconocido'.

En mi caso, al no tener mucho contexto de las variables, decidiré imputar los valores nulos de las variables categóricas por un valor fijo 'Desconocido' ya que realmente no conocemos la naturaleza de esos valores nulos. Prefiero no imputar por moda, ya que en algunas variables categóricas realmente no observamos un valor predominante sobre los demás, por lo que podríamos distorsionar la distribución de dichas variables.

En el caso de las numéricas, optaré por imputar la mediana ya que la mayoría de las variables numéricas no siguen una distribución normal y a pesar de no presentar un gran porcentaje de valores atípicos la mediana no se ve afectada por valores extremos, a diferencia de la media. Además, los modelos de machine learning suelen ser sensibles a valores extremos. Usar la mediana reduce la posibilidad de que los valores imputados introduzcan ruido o sesgo no deseado.

En el caso de las variables booleanas, variables que toman el valor 0 o 1, si que optaré por imputar su moda, ya que no tiene sentido imputar por su mediana si verdaderamente su distribución toman dos únicos valores.

In [28]:
list_cat_vars, other = f_aux.dame_variables_categoricas(dataset=df_loan_train)

# Nos aseguramos de que las columnas categóricas permitan la categoría 'Desconocido'
for col in list_cat_vars:
    if pd.api.types.is_categorical_dtype(df_loan_train[col]):
        # Agregar 'Desconocido' como categoría si no existe
        df_loan_train[col] = df_loan_train[col].cat.add_categories(['Desconocido'])

# Imputar valores nulos con 'Desconocido'
df_loan_train[list_cat_vars] = df_loan_train[list_cat_vars].fillna(value='Desconocido')


df_loan_train[list_cat_vars]
Out[28]:
FONDKAPREMONT_MODE WALLSMATERIAL_MODE HOUSETYPE_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE NAME_TYPE_SUITE HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION ORGANIZATION_TYPE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER CNT_CHILDREN NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE NAME_EDUCATION_TYPE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_OWN_REALTY LIVE_REGION_NOT_WORK_REGION FLAG_EMAIL REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START FLAG_PHONE REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY REG_REGION_NOT_WORK_REGION FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_11 FLAG_DOCUMENT_10 FLAG_DOCUMENT_9 FLAG_DOCUMENT_8 FLAG_DOCUMENT_7 FLAG_DOCUMENT_6 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_19 FLAG_DOCUMENT_18 FLAG_DOCUMENT_17 FLAG_DOCUMENT_16 FLAG_DOCUMENT_15 FLAG_DOCUMENT_14 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 TARGET
226983 reg oper account Panel block of flats No Laborers Children 16 0 Industry: type 3 Cash loans N F 0 Working Civil marriage House / apartment Secondary / secondary special 1 1 0 1 Y 0 0 2 2 WEDNESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
199779 Desconocido Desconocido Desconocido Desconocido Security staff Unaccompanied 10 0 Security Revolving loans N M 0 Working Married House / apartment Secondary / secondary special 1 1 0 1 Y 0 0 2 2 TUESDAY 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
302311 Desconocido Desconocido Desconocido Desconocido Desconocido Unaccompanied 9 0 Business Entity Type 3 Cash loans Y F 0 Commercial associate Married House / apartment Higher education 1 1 1 1 Y 0 0 2 2 THURSDAY 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
214722 reg oper account Panel block of flats No Managers Unaccompanied 13 0 Legal Services Cash loans N F 0 State servant Married House / apartment Higher education 1 1 0 1 Y 0 0 2 2 THURSDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
225424 Desconocido Desconocido Desconocido Desconocido Desconocido Family 11 0 Business Entity Type 3 Cash loans Y M 1 Working Married House / apartment Secondary / secondary special 1 1 0 1 Y 0 0 2 2 SATURDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
304704 reg oper account Block block of flats No High skill tech staff Unaccompanied 8 0 Business Entity Type 2 Cash loans N F 0 Working Single / not married House / apartment Secondary / secondary special 1 1 0 1 Y 0 0 2 2 WEDNESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
300681 Desconocido Desconocido Desconocido Desconocido Low-skill Laborers Spouse, partner 19 0 Industry: type 3 Revolving loans N M 0 Working Married House / apartment Secondary / secondary special 1 1 1 1 Y 0 0 2 2 FRIDAY 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
272530 reg oper account Panel block of flats No Managers Unaccompanied 12 0 Other Cash loans Y M 0 Working Married House / apartment Secondary / secondary special 1 1 0 1 Y 0 0 2 2 SUNDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
121945 Desconocido Stone, brick block of flats No Cleaning staff Unaccompanied 15 1 University Cash loans N F 0 Working Single / not married House / apartment Secondary / secondary special 1 1 1 1 Y 0 0 2 2 SUNDAY 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
68902 Desconocido Desconocido Desconocido Desconocido Desconocido Unaccompanied 9 0 Industry: type 11 Cash loans N M 0 Commercial associate Civil marriage House / apartment Higher education 1 1 0 1 Y 0 0 2 2 THURSDAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

246008 rows × 53 columns

In [29]:
for col in list_cat_vars:
    if pd.api.types.is_categorical_dtype(df_loan_test[col]):
        # Agregar 'Desconocido' como categoría si no existe
        df_loan_test[col] = df_loan_test[col].cat.add_categories(['Desconocido'])

# Imputar valores nulos con 'Desconocido'
df_loan_test[list_cat_vars] = df_loan_test[list_cat_vars].fillna(value='Desconocido')


df_loan_test[list_cat_vars]
Out[29]:
FONDKAPREMONT_MODE WALLSMATERIAL_MODE HOUSETYPE_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE NAME_TYPE_SUITE HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION ORGANIZATION_TYPE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER CNT_CHILDREN NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE NAME_EDUCATION_TYPE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_OWN_REALTY LIVE_REGION_NOT_WORK_REGION FLAG_EMAIL REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START FLAG_PHONE REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY REG_REGION_NOT_WORK_REGION FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_11 FLAG_DOCUMENT_10 FLAG_DOCUMENT_9 FLAG_DOCUMENT_8 FLAG_DOCUMENT_7 FLAG_DOCUMENT_6 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_19 FLAG_DOCUMENT_18 FLAG_DOCUMENT_17 FLAG_DOCUMENT_16 FLAG_DOCUMENT_15 FLAG_DOCUMENT_14 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 TARGET
153756 Desconocido Panel block of flats No Desconocido Unaccompanied 13 0 XNA Cash loans N F 0 Pensioner Married House / apartment Secondary / secondary special 1 0 0 1 Y 0 0 2 2 SUNDAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
275524 Desconocido Stone, brick block of flats No Desconocido Unaccompanied 8 0 Government Cash loans N F 0 Working Widow House / apartment Secondary / secondary special 1 1 1 1 Y 0 0 3 3 TUESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
138520 Desconocido Desconocido Desconocido Desconocido Laborers Unaccompanied 11 0 Business Entity Type 3 Cash loans N M 0 Working Married House / apartment Secondary / secondary special 1 1 1 1 Y 0 0 2 2 MONDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
244761 Desconocido Panel block of flats No Core staff Unaccompanied 19 0 Government Cash loans N F 0 Working Single / not married Municipal apartment Higher education 1 1 0 1 Y 0 0 1 1 WEDNESDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
74161 org spec account Stone, brick block of flats No Sales staff Unaccompanied 17 0 Business Entity Type 3 Cash loans N M 0 Commercial associate Single / not married With parents Higher education 1 1 0 1 N 0 0 2 2 TUESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
208867 Desconocido Desconocido Desconocido Desconocido Desconocido Spouse, partner 10 0 XNA Cash loans N F 0 Pensioner Married House / apartment Secondary / secondary special 1 0 0 1 Y 0 0 2 2 WEDNESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
96975 Desconocido Desconocido Desconocido Desconocido Laborers Unaccompanied 7 0 Business Entity Type 3 Cash loans N F 0 Working Married House / apartment Secondary / secondary special 1 1 0 1 Y 0 0 3 3 WEDNESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
270299 Desconocido Desconocido Desconocido Desconocido Accountants Unaccompanied 10 0 Government Revolving loans N F 0 State servant Married House / apartment Higher education 1 1 1 1 N 0 0 2 2 WEDNESDAY 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
241114 reg oper account Panel block of flats No Desconocido Unaccompanied 10 0 Business Entity Type 1 Cash loans Y F 1 Working Married House / apartment Secondary / secondary special 1 1 1 1 N 0 0 2 2 THURSDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55852 Desconocido Desconocido Desconocido Desconocido Cleaning staff Unaccompanied 11 0 Business Entity Type 3 Cash loans N F 0 Commercial associate Married House / apartment Secondary / secondary special 1 1 1 1 Y 0 0 2 2 MONDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

61503 rows × 53 columns

No observamos valores nulos en nuestras columnas que presentan valores booleanos, aunque si tuvieramos presencia de ellos y nos surgiera la necesidad de imputar la moda en lugar de esos valores, podríamos utilizar el bucle descrito en el siguiente código.

In [30]:
df_loan_train[df_loan_bool].isnull().sum()

# for col in df_loan_train.select_dtypes(include=['bool']).columns:
    # Calcular la moda de la columna
#    moda = df_loan_train[col].mode()[0]
    # Sustituir los valores nulos con la moda
#    df_loan_train[col] = df_loan_train[col].fillna(moda)
Out[30]:
REG_REGION_NOT_LIVE_REGION     0
FLAG_MOBIL                     0
FLAG_EMP_PHONE                 0
FLAG_WORK_PHONE                0
FLAG_CONT_MOBILE               0
TARGET                         0
LIVE_REGION_NOT_WORK_REGION    0
FLAG_EMAIL                     0
FLAG_PHONE                     0
REG_CITY_NOT_LIVE_CITY         0
REG_CITY_NOT_WORK_CITY         0
LIVE_CITY_NOT_WORK_CITY        0
REG_REGION_NOT_WORK_REGION     0
FLAG_DOCUMENT_4                0
FLAG_DOCUMENT_5                0
FLAG_DOCUMENT_2                0
FLAG_DOCUMENT_3                0
FLAG_DOCUMENT_11               0
FLAG_DOCUMENT_10               0
FLAG_DOCUMENT_9                0
FLAG_DOCUMENT_8                0
FLAG_DOCUMENT_7                0
FLAG_DOCUMENT_6                0
FLAG_DOCUMENT_12               0
FLAG_DOCUMENT_13               0
FLAG_DOCUMENT_19               0
FLAG_DOCUMENT_18               0
FLAG_DOCUMENT_17               0
FLAG_DOCUMENT_16               0
FLAG_DOCUMENT_15               0
FLAG_DOCUMENT_14               0
FLAG_DOCUMENT_20               0
FLAG_DOCUMENT_21               0
dtype: int64
In [20]:
# Imputar valores nulos en columnas numéricas con la mediana
for col in df_loan_train.select_dtypes(include=['number']).columns:
    # Calcular la mediana de la columna
    mediana = df_loan_train[col].median()
    # Sustituir los valores nulos con la mediana
    df_loan_train[col] = df_loan_train[col].fillna(mediana)

df_loan_train[df_loan_num].head(10)
Out[20]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_AVG YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_AVG APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EXT_SOURCE_3 EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY DAYS_LAST_PHONE_CHANGE ORGANIZATION_TYPE AMT_CREDIT AMT_INCOME_TOTAL REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH
77061 189359 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 3.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.270597 0.1658 0.1755 0.1693 0.32 0.3351 0.3414 0.3383 0.2759 0.3706 0.3861 0.3773 0.3750 0.3750 0.9876 0.9876 0.9876 0.3636 0.535276 0.482504 270000.0 13500.0 -159.0 Business Entity Type 3 270000.0 135000.0 0.010643 -12189 -805 -4417.0 -677
275131 418867 0.0142 0.0142 0.0143 0.0000 0.0000 0.0000 0.0770 0.0756 0.0826 0.2083 0.7648 0.7585 0.7552 9.0 0.0639 0.0629 0.0643 0.0794 0.0765 0.0765 0.232782 0.0000 0.0000 0.0000 0.00 0.0928 0.0945 0.0937 0.2069 0.0761 0.0793 0.0774 0.1667 0.1667 0.9821 0.9821 0.9821 0.0807 0.055333 0.483454 828000.0 26707.5 -362.0 Security 828000.0 270000.0 0.008019 -15494 -854 -6733.0 -3988
140909 263377 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 6.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.505892 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0864 0.1379 0.0744 0.0731 0.0748 0.1667 0.1667 0.9816 0.9816 0.9816 0.0687 0.511892 0.617727 463500.0 50616.0 -208.0 Construction 500211.0 225000.0 0.007330 -17292 -1923 -1647.0 -818
229664 366006 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0487 0.0482 0.0459 0.1902 0.1833 0.1833 0.505892 0.0362 0.0383 0.0369 0.28 0.2577 0.2626 0.2602 0.2414 0.1487 0.1550 0.1514 0.3333 0.3333 0.9846 0.9846 0.9846 0.2003 0.535276 0.704520 135000.0 6750.0 -1636.0 Business Entity Type 3 135000.0 135000.0 0.046220 -21915 -1075 -11335.0 -4154
84374 197882 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.505892 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0864 0.1379 0.0744 0.0731 0.0748 0.1667 0.1667 0.9816 0.9816 0.9816 0.0687 0.535276 0.677117 675000.0 30528.0 0.0 Housing 942300.0 67500.0 0.011657 -14687 -3125 -6125.0 -4317
77286 189614 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0975 0.0958 0.1047 0.4167 0.8236 0.8189 0.8164 9.0 0.0000 0.0000 0.0000 0.0755 0.0727 0.0727 0.834351 0.1188 0.1258 0.1213 0.28 0.1175 0.1197 0.1187 0.1379 0.1391 0.1450 0.1416 0.3750 0.3750 0.9866 0.9866 0.9866 0.1353 0.684828 0.703996 1048500.0 35239.5 -1457.0 Business Entity Type 3 1200744.0 180000.0 0.072508 -22309 -6728 -9218.0 -4557
297454 444613 0.0210 0.0208 0.0190 0.0039 0.0039 0.0039 0.0958 0.0941 0.1028 0.7917 0.9477 0.9463 0.9456 2.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.505892 0.0036 0.0011 0.0030 0.08 0.1165 0.1187 0.1176 0.0345 0.1253 0.1306 0.1276 0.7500 0.7500 0.9960 0.9960 0.9960 0.1256 0.577969 0.558124 562500.0 14967.0 -219.0 Business Entity Type 3 562500.0 675000.0 0.028663 -12723 -2390 -286.0 -4704
56173 165091 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 10.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.505892 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0864 0.1379 0.0744 0.0731 0.0748 0.1667 0.1667 0.9816 0.9816 0.9816 0.0687 0.535276 0.185789 180000.0 10179.0 -3068.0 XNA 180000.0 90000.0 0.020713 -21738 365243 -3323.0 -5213
86413 200287 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.505892 0.0000 0.0000 0.0000 0.32 0.0969 0.0987 0.0978 0.1379 0.0893 0.0930 0.0909 0.5417 0.5417 0.9926 0.9925 0.9925 0.0702 0.475850 0.661936 270000.0 11718.0 -321.0 Business Entity Type 2 312768.0 157500.0 0.026392 -19661 -5198 -4862.0 -3144
147461 270975 0.0210 0.0208 0.0190 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 15.0 0.0487 0.0482 0.0459 0.0746 0.0763 0.0759 0.453667 0.0036 0.0011 0.0030 0.00 0.0247 0.0252 0.0250 0.0690 0.0744 0.0731 0.0748 0.0833 0.0833 0.9777 0.9776 0.9776 0.0148 0.691021 0.702847 1534500.0 47209.5 -2.0 Hotel 1716799.5 405000.0 0.046220 -14268 -398 -6528.0 -4856
In [21]:
for col in df_loan_test.select_dtypes(include=['number']).columns:
    # Calcular la mediana de la columna
    mediana = df_loan_test[col].median()
    # Sustituir los valores nulos con la mediana
    df_loan_test[col] = df_loan_test[col].fillna(mediana)

df_loan_test[df_loan_num].head(10)
Out[21]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_AVG YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_AVG APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EXT_SOURCE_3 EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY DAYS_LAST_PHONE_CHANGE ORGANIZATION_TYPE AMT_CREDIT AMT_INCOME_TOTAL REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH
144561 267624 0.0166 0.0168 0.0168 0.0 0.0 0.0 0.1026 0.1009 0.1102 0.0417 0.7125 0.7048 0.7008 16.0 0.0000 0.0000 0.00000 0.1353 0.1304 0.1304 0.263461 0.0000 0.0000 0.0000 0.00 0.1237 0.1261 0.1249 0.2759 0.1167 0.1216 0.1188 0.1667 0.1667 0.9782 0.9781 0.9781 0.1009 0.309275 0.563516 135000.0 7879.5 -2331.0 Other 135000.0 157500.0 0.018209 -15733 -3330 -3168.0 -4032
218685 353356 0.0080 0.0081 0.0081 0.0 0.0 0.0 0.0599 0.0588 0.0643 0.3750 0.8497 0.8457 0.8436 9.0 0.0546 0.0536 0.05490 0.0628 0.0606 0.0606 0.920857 0.0000 0.0000 0.0000 0.08 0.0722 0.0735 0.0729 0.0690 0.0740 0.0771 0.0753 0.3333 0.3333 0.9886 0.9886 0.9886 0.0626 0.526295 0.734832 1800000.0 49500.0 -817.0 Business Entity Type 1 1800000.0 382500.0 0.046220 -14720 -4833 -7602.0 -4315
78843 191398 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.506423 0.0037 0.0012 0.0032 0.00 0.0876 0.0840 0.0874 0.1379 0.0748 0.0732 0.0751 0.1667 0.1667 0.9816 0.9816 0.9816 0.0690 0.000527 0.614702 1125000.0 37800.0 -1477.0 Transport: type 2 1288350.0 202500.0 0.030755 -14247 -5966 -4294.0 -602
3236 103777 0.0203 0.0205 0.0205 0.0 0.0 0.0 0.0923 0.0908 0.0992 0.3750 0.9216 0.9195 0.9184 9.0 0.0816 0.0802 0.08200 0.0462 0.0648 0.0678 0.506423 0.0000 0.0000 0.0000 0.08 0.1017 0.1134 0.1124 0.0690 0.0900 0.0883 0.0943 0.3471 0.3333 0.9940 0.9930 0.9940 0.0746 0.579727 0.641314 774000.0 27549.0 -1438.0 Transport: type 4 774000.0 112500.0 0.028663 -15376 -1388 -5113.0 -4581
104780 221590 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.591484 0.0037 0.0012 0.0032 0.00 0.0876 0.0840 0.0874 0.1379 0.0748 0.0732 0.0751 0.1667 0.1667 0.9816 0.9816 0.9816 0.0690 0.746300 0.569096 315000.0 15750.0 -1244.0 Self-employed 315000.0 112500.0 0.018029 -17029 -1860 -6740.0 -577
111137 228941 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.506423 0.0037 0.0012 0.0032 0.00 0.0876 0.0840 0.0874 0.1379 0.0748 0.0732 0.0751 0.1667 0.1667 0.9816 0.9816 0.9816 0.0690 0.112474 0.581438 382500.0 19125.0 -293.0 School 382500.0 135000.0 0.018801 -19628 -9683 -11727.0 -3151
224410 359919 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.682831 0.0037 0.0012 0.0032 0.00 0.0876 0.0840 0.0874 0.1379 0.0748 0.0732 0.0751 0.1667 0.1667 0.9816 0.9816 0.9816 0.0690 0.141992 0.068473 454500.0 20596.5 -1564.0 XNA 634482.0 67500.0 0.020246 -20672 365243 -5254.0 -4153
263223 404784 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.506423 0.0037 0.0012 0.0032 0.00 0.0309 0.0315 0.0312 0.0690 0.0247 0.0257 0.0251 0.1667 0.1667 0.9891 0.9891 0.9891 0.0221 0.835777 0.315667 225000.0 15165.0 -46.0 Business Entity Type 3 225000.0 103500.0 0.004960 -16528 -808 -6544.0 -67
210449 343870 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.506423 0.0037 0.0012 0.0032 0.00 0.0876 0.0840 0.0874 0.1379 0.0748 0.0732 0.0751 0.1667 0.1667 0.9816 0.9816 0.9816 0.0690 0.417100 0.570720 661500.0 29268.0 0.0 XNA 661500.0 81000.0 0.011703 -23472 365243 -13525.0 -4389
45434 152622 0.0213 0.0211 0.0193 0.0 0.0 0.0 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 13.0 0.0484 0.0480 0.04555 0.0746 0.0763 0.0757 0.203146 0.0037 0.0012 0.0032 0.00 0.0876 0.0840 0.0874 0.1379 0.0748 0.0732 0.0751 0.1667 0.1667 0.9816 0.9816 0.9816 0.0690 0.513694 0.569905 450000.0 41404.5 -2388.0 Security Ministries 450000.0 360000.0 0.018209 -10191 -2368 -1210.0 -1246
In [23]:
f_aux.get_percent_null_values_target(df_loan_train, [i for i in list_var_continuous], target='TARGET')
f_aux.get_percent_null_values_target(df_loan_test, [i for i in list_var_continuous], target='TARGET')
No existen variables con valores nulos
No existen variables con valores nulos
Out[23]:

Nos aseguramos que todas las imputaciones de valores nulos se han realizado de manera exitosa.

Matriz de correlación para variables categóricas: Cramers V matrix¶

Debido a que no podemos ver la correlación de las variables categóricas con el estadístico de Pearson, vamos a acercarnos lo máximo posible con el estadístico de V Cramers. Podremos observar la correlación de nuestras variables categóricas.

Si bien aunque nuestras variables booleanas que toman valores de 0 o 1 son numéricas, su verdadero origen e interpretación es categórica, ya que si toma valor de 0 pertenece a una categoría distinta de si tomara valor de 1. Por tanto, trataremos a estas como tal y realizaremos su correlación según la V de Cramers.

In [24]:
df_cat_bool = pd.concat([df_loan_train[df_loan_cat], df_loan_train[df_loan_bool]], axis=1)
df_cat_bool.columns.values
Out[24]:
array(['FONDKAPREMONT_MODE', 'FLOORSMIN_MODE', 'FLOORSMIN_MEDI',
       'ELEVATORS_MEDI', 'ELEVATORS_MODE', 'WALLSMATERIAL_MODE',
       'ENTRANCES_MEDI', 'ENTRANCES_MODE', 'HOUSETYPE_MODE',
       'FLOORSMAX_MODE', 'EMERGENCYSTATE_MODE', 'OCCUPATION_TYPE',
       'AMT_REQ_CREDIT_BUREAU_WEEK', 'AMT_REQ_CREDIT_BUREAU_MON',
       'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_DAY',
       'AMT_REQ_CREDIT_BUREAU_YEAR', 'AMT_REQ_CREDIT_BUREAU_QRT',
       'NAME_TYPE_SUITE', 'OBS_60_CNT_SOCIAL_CIRCLE',
       'DEF_60_CNT_SOCIAL_CIRCLE', 'OBS_30_CNT_SOCIAL_CIRCLE',
       'DEF_30_CNT_SOCIAL_CIRCLE', 'CNT_FAM_MEMBERS',
       'HOUR_APPR_PROCESS_START', 'NAME_CONTRACT_TYPE', 'FLAG_OWN_CAR',
       'CODE_GENDER', 'CNT_CHILDREN', 'NAME_INCOME_TYPE',
       'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'NAME_EDUCATION_TYPE',
       'FLAG_OWN_REALTY', 'REGION_RATING_CLIENT',
       'REGION_RATING_CLIENT_W_CITY', 'WEEKDAY_APPR_PROCESS_START',
       'REG_REGION_NOT_LIVE_REGION', 'FLAG_MOBIL', 'FLAG_EMP_PHONE',
       'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'TARGET',
       'LIVE_REGION_NOT_WORK_REGION', 'FLAG_EMAIL', 'FLAG_PHONE',
       'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY',
       'LIVE_CITY_NOT_WORK_CITY', 'REG_REGION_NOT_WORK_REGION',
       'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_2',
       'FLAG_DOCUMENT_3', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_10',
       'FLAG_DOCUMENT_9', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_7',
       'FLAG_DOCUMENT_6', 'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13',
       'FLAG_DOCUMENT_19', 'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_17',
       'FLAG_DOCUMENT_16', 'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_14',
       'FLAG_DOCUMENT_20', 'FLAG_DOCUMENT_21'], dtype=object)
In [25]:
confusion_matrix = pd.crosstab(df_loan_train["TARGET"], df_loan_train["NAME_CONTRACT_TYPE"])
print(confusion_matrix)
f_aux.cramers_v(confusion_matrix.values)
NAME_CONTRACT_TYPE  Cash loans  Revolving loans
TARGET                                         
0                       203995            22153
1                        18578             1282
Out[25]:
np.float64(0.030907639971484904)
In [26]:
confusion_matrix = pd.crosstab(df_loan_train["TARGET"], df_loan_train["TARGET"])
f_aux.cramers_v(confusion_matrix.values)
Out[26]:
np.float64(0.9999726127135284)
In [27]:
corr_cats = f_aux.corr_cat(df=df_cat_bool, target='TARGET' ,target_transform=True)
corr_cats
Out[27]:
FONDKAPREMONT_MODE WALLSMATERIAL_MODE HOUSETYPE_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE NAME_TYPE_SUITE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE NAME_EDUCATION_TYPE FLAG_OWN_REALTY WEEKDAY_APPR_PROCESS_START TARGET
FONDKAPREMONT_MODE 1.000000 0.350355 0.395598 0.461219 0.031081 0.016261 0.020674 0.014879 0.012727 0.028382 0.025122 0.030463 0.043584 0.017184 0.004833 0.031273
WALLSMATERIAL_MODE 0.350355 1.000000 0.559244 0.687233 0.032294 0.013907 0.026873 0.036372 0.021165 0.030616 0.037674 0.043419 0.062620 0.029568 0.003571 0.043847
HOUSETYPE_MODE 0.395598 0.559244 1.000000 0.669155 0.045573 0.019460 0.025535 0.034234 0.020464 0.041969 0.044066 0.045154 0.066813 0.024300 0.002887 0.040570
EMERGENCYSTATE_MODE 0.461219 0.687233 0.669155 1.000000 0.055708 0.025321 0.025932 0.036046 0.021761 0.053056 0.055242 0.060189 0.085016 0.023221 0.005764 0.042562
OCCUPATION_TYPE 0.031081 0.032294 0.045573 0.055708 1.000000 0.020667 0.059424 0.256661 0.360044 0.289179 0.101781 0.043685 0.187258 0.048577 0.017458 0.080468
NAME_TYPE_SUITE 0.016261 0.013907 0.019460 0.025321 0.020667 1.000000 0.031827 0.043771 0.045054 0.020313 0.067381 0.018877 0.024132 0.073124 0.016550 0.011804
NAME_CONTRACT_TYPE 0.020674 0.026873 0.025535 0.025932 0.059424 0.031827 0.999976 0.004439 0.013319 0.061202 0.047842 0.027278 0.066602 0.066282 0.015337 0.030908
FLAG_OWN_CAR 0.014879 0.036372 0.034234 0.036046 0.256661 0.043771 0.004439 0.999991 0.344354 0.156715 0.167807 0.040472 0.097498 0.000000 0.002167 0.021434
CODE_GENDER 0.012727 0.021165 0.020464 0.021761 0.360044 0.045054 0.013319 0.344354 1.000000 0.120373 0.118466 0.045401 0.017458 0.042422 0.004928 0.056410
NAME_INCOME_TYPE 0.028382 0.030616 0.041969 0.053056 0.289179 0.020313 0.061202 0.156715 0.120373 1.000000 0.127084 0.054641 0.103450 0.070855 0.012216 0.063819
NAME_FAMILY_STATUS 0.025122 0.037674 0.044066 0.055242 0.101781 0.067381 0.047842 0.167807 0.118466 0.127084 1.000000 0.076732 0.052688 0.053152 0.005424 0.040386
NAME_HOUSING_TYPE 0.030463 0.043419 0.045154 0.060189 0.043685 0.018877 0.027278 0.040472 0.045401 0.054641 0.076732 1.000000 0.041538 0.226680 0.004623 0.035852
NAME_EDUCATION_TYPE 0.043584 0.062620 0.066813 0.085016 0.187258 0.024132 0.066602 0.097498 0.017458 0.103450 0.052688 0.041538 1.000000 0.030433 0.004408 0.056539
FLAG_OWN_REALTY 0.017184 0.029568 0.024300 0.023221 0.048577 0.073124 0.066282 0.000000 0.042422 0.070855 0.053152 0.226680 0.030433 0.999990 0.025975 0.005191
WEEKDAY_APPR_PROCESS_START 0.004833 0.003571 0.002887 0.005764 0.017458 0.016550 0.015337 0.002167 0.004928 0.012216 0.005424 0.004623 0.004408 0.025975 1.000000 0.006202
TARGET 0.031273 0.043847 0.040570 0.042562 0.080468 0.011804 0.030908 0.021434 0.056410 0.063819 0.040386 0.035852 0.056539 0.005191 0.006202 0.999973

A continuación, graficaré la V de Cramers de variables categóricas puras, es decir, las variables que contienen texto (string) en sus columnas, he decidido graficarla de esta manera en lugar de graficarla como la matriz de correlación anterior numérica por dos motivos:

  1. Hay menos variables categóricas que numéricas, por tanto, la visualización es más descriptiva y proporciona más información, ya que las variables numéricas anteriormente visualizadas son muchas y se facilita su interpretación mediante la tabla de correlaciones.

  2. Con el objetivo de poner en práctica el conocimiento adquirido en la asignatura y adaptar las visualizaciones de manera distinta, intentando mejorarlas y proporcionar una mayor interpretabilidad con código y funciones propias.

In [28]:
plt.figure(figsize=(15,8))
sns.heatmap(corr_cats, annot=True, fmt='.3f', cmap='YlGnBu')
plt.title('Cramers V Matrix', fontdict={'size':'17'})
plt.show()
No description has been provided for this image
In [29]:
warnings.filterwarnings("ignore")

corr_bool = f_aux.corr_cat_boolean(df_loan_train[df_loan_bool])
corr_bool
Out[29]:
REG_REGION_NOT_LIVE_REGION FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE TARGET LIVE_REGION_NOT_WORK_REGION FLAG_EMAIL FLAG_PHONE REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY REG_REGION_NOT_WORK_REGION FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_11 FLAG_DOCUMENT_10 FLAG_DOCUMENT_9 FLAG_DOCUMENT_8 FLAG_DOCUMENT_7 FLAG_DOCUMENT_6 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_19 FLAG_DOCUMENT_18 FLAG_DOCUMENT_17 FLAG_DOCUMENT_16 FLAG_DOCUMENT_15 FLAG_DOCUMENT_14 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21
REG_REGION_NOT_LIVE_REGION 0.999864 0.000000 0.036448 0.064337 0.000000 0.006935 0.087785 0.017924 0.000000 0.340818 0.143234 0.008111 0.451288 0.000000 0.008362 0.000000 0.033300 0.101413 0.000000 0.017521 0.023594 0.000000 0.024039 0.000000 0.004033 0.000000 0.006931 0.002121 0.005909 0.000000 0.000000 0.000382 0.005234
FLAG_MOBIL 0.000000 0.499995 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002332 0.000000 0.000000 0.000000 0.000000 0.000000 0.010781 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_EMP_PHONE 0.036448 0.000000 0.999986 0.234463 0.012614 0.046295 0.096460 0.062746 0.016040 0.091404 0.256401 0.219311 0.108447 0.000000 0.019342 0.000000 0.249609 0.029127 0.000000 0.023975 0.121884 0.000000 0.597870 0.000000 0.026345 0.009381 0.040891 0.006542 0.043219 0.014710 0.022684 0.009618 0.008300
FLAG_WORK_PHONE 0.064337 0.000000 0.234463 0.999987 0.021575 0.028445 0.041709 0.012943 0.292630 0.046976 0.121463 0.109409 0.068454 0.003392 0.035973 0.000000 0.060310 0.122612 0.000000 0.008575 0.021704 0.000000 0.138852 0.000000 0.000000 0.012102 0.033436 0.000000 0.005636 0.005112 0.002172 0.000000 0.000000
FLAG_CONT_MOBILE 0.000000 0.000000 0.012614 0.021575 0.998925 0.000000 0.002953 0.006254 0.004956 0.000000 0.001274 0.001813 0.000371 0.000000 0.004604 0.000000 0.003629 0.000000 0.000000 0.005182 0.020224 0.000000 0.010232 0.000000 0.073307 0.000000 0.042001 0.013182 0.024330 0.015954 0.061699 0.000000 0.016460
TARGET 0.006935 0.000000 0.046295 0.028445 0.000000 0.999973 0.002928 0.000000 0.023297 0.047109 0.051015 0.031486 0.008190 0.000000 0.000000 0.000965 0.044312 0.003981 0.000000 0.003394 0.007536 0.001214 0.028506 0.000000 0.011202 0.000000 0.007344 0.001771 0.010573 0.005808 0.009598 0.000000 0.003850
LIVE_REGION_NOT_WORK_REGION 0.087785 0.000000 0.096460 0.041709 0.002953 0.002928 0.999948 0.026898 0.005378 0.021181 0.186773 0.237361 0.859998 0.000000 0.012922 0.000000 0.010027 0.006539 0.000000 0.016361 0.058313 0.000000 0.058634 0.000000 0.015981 0.001422 0.002781 0.000000 0.003221 0.000632 0.012998 0.000000 0.000000
FLAG_EMAIL 0.017924 0.000000 0.062746 0.012943 0.006254 0.000000 0.026898 0.999962 0.015122 0.013158 0.003776 0.001991 0.031887 0.001767 0.001362 0.002537 0.011490 0.002866 0.000000 0.007607 0.029476 0.000000 0.042982 0.000000 0.001790 0.002508 0.007093 0.000000 0.013565 0.003596 0.000000 0.001860 0.001012
FLAG_PHONE 0.000000 0.000000 0.016040 0.292630 0.004956 0.023297 0.005378 0.015122 0.999990 0.047743 0.046570 0.024331 0.003734 0.004442 0.073286 0.000000 0.008268 0.003115 0.002628 0.013540 0.003610 0.011741 0.011079 0.000000 0.006926 0.007866 0.003753 0.002095 0.009108 0.007502 0.010039 0.000000 0.000000
REG_CITY_NOT_LIVE_CITY 0.340818 0.000000 0.091404 0.046976 0.000000 0.047109 0.021181 0.013158 0.047743 0.999972 0.439100 0.027484 0.152670 0.000000 0.000000 0.000000 0.001120 0.053924 0.002451 0.005587 0.017619 0.000000 0.058014 0.000000 0.000000 0.004914 0.014584 0.000000 0.011526 0.000000 0.003763 0.000000 0.000884
REG_CITY_NOT_WORK_CITY 0.143234 0.000000 0.256401 0.121463 0.001274 0.051015 0.186773 0.003776 0.046570 0.439100 0.999989 0.826356 0.240999 0.002642 0.011211 0.001045 0.056931 0.033442 0.000000 0.000000 0.041897 0.000000 0.157609 0.000000 0.000000 0.003755 0.013745 0.000000 0.002251 0.000000 0.005554 0.000000 0.003860
LIVE_CITY_NOT_WORK_CITY 0.008111 0.000000 0.219311 0.109409 0.001813 0.031486 0.237361 0.001991 0.024331 0.027484 0.826356 0.999986 0.197088 0.003018 0.012873 0.000000 0.055402 0.000962 0.000000 0.003068 0.041488 0.000000 0.133315 0.000000 0.000000 0.000000 0.004838 0.000000 0.003102 0.000000 0.004788 0.000000 0.004583
REG_REGION_NOT_WORK_REGION 0.451288 0.000000 0.108447 0.068454 0.000371 0.008190 0.859998 0.031887 0.003734 0.152670 0.240999 0.197088 0.999958 0.000000 0.015266 0.000000 0.020457 0.057097 0.000000 0.021675 0.058723 0.000000 0.066608 0.000000 0.012478 0.000000 0.005906 0.000000 0.000000 0.000000 0.011965 0.000000 0.000000
FLAG_DOCUMENT_4 0.000000 0.000000 0.000000 0.003392 0.000000 0.000000 0.000000 0.001767 0.004442 0.000000 0.002642 0.003018 0.000000 0.974998 0.000000 0.000000 0.013447 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_5 0.008362 0.000000 0.019342 0.035973 0.004604 0.000000 0.012922 0.001362 0.073286 0.000000 0.011211 0.012873 0.015266 0.000000 0.999864 0.000000 0.194000 0.007206 0.000000 0.007197 0.036877 0.000000 0.038528 0.000000 0.006870 0.001004 0.010199 0.000000 0.011751 0.003237 0.005989 0.000356 0.000000
FLAG_DOCUMENT_2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000965 0.000000 0.002537 0.000000 0.000000 0.001045 0.000000 0.000000 0.000000 0.000000 0.937498 0.007874 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_3 0.033300 0.000000 0.249609 0.060310 0.003629 0.044312 0.010027 0.011490 0.008268 0.001120 0.056931 0.055402 0.020457 0.013447 0.194000 0.007874 0.999990 0.092587 0.006508 0.097381 0.465736 0.020713 0.486462 0.000000 0.019381 0.008766 0.007172 0.002696 0.034111 0.000000 0.000000 0.007105 0.026226
FLAG_DOCUMENT_11 0.101413 0.000000 0.029127 0.122612 0.000000 0.003981 0.006539 0.002866 0.003115 0.053924 0.033442 0.000962 0.057097 0.000000 0.007206 0.000000 0.092587 0.999474 0.000000 0.002696 0.018132 0.000000 0.018282 0.000000 0.002476 0.000000 0.004932 0.000000 0.005564 0.000000 0.001828 0.000000 0.000000
FLAG_DOCUMENT_10 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002628 0.002451 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.006508 0.000000 0.916664 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_9 0.017521 0.000000 0.023975 0.008575 0.005182 0.003394 0.016361 0.007607 0.013540 0.005587 0.000000 0.003068 0.021675 0.000000 0.007197 0.000000 0.097381 0.002696 0.000000 0.999473 0.018352 0.000000 0.019188 0.000000 0.000000 0.000000 0.000000 0.004431 0.007011 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_8 0.023594 0.002332 0.121884 0.021704 0.020224 0.007536 0.058313 0.029476 0.003610 0.017619 0.041897 0.041488 0.058723 0.000000 0.036877 0.000000 0.465736 0.018132 0.000000 0.018352 0.999973 0.002838 0.092694 0.000000 0.076036 0.000000 0.008336 0.002902 0.011595 0.021195 0.029604 0.002940 0.000000
FLAG_DOCUMENT_7 0.000000 0.000000 0.000000 0.000000 0.000000 0.001214 0.000000 0.000000 0.011741 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020713 0.000000 0.000000 0.000000 0.002838 0.988887 0.003079 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_6 0.024039 0.000000 0.597870 0.138852 0.010232 0.028506 0.058634 0.042982 0.011079 0.058014 0.157609 0.133315 0.066608 0.000000 0.038528 0.000000 0.486462 0.018282 0.000000 0.019188 0.092694 0.003079 0.999975 0.000000 0.017694 0.004891 0.024562 0.003318 0.026666 0.009094 0.013654 0.005685 0.005088
FLAG_DOCUMENT_12 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.499995 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_13 0.004033 0.000000 0.026345 0.000000 0.073307 0.011202 0.015981 0.001790 0.006926 0.000000 0.000000 0.000000 0.012478 0.000000 0.006870 0.000000 0.019381 0.002476 0.000000 0.000000 0.076036 0.000000 0.017694 0.000000 0.999430 0.000000 0.004668 0.000000 0.005282 0.000000 0.001580 0.045628 0.003804
FLAG_DOCUMENT_19 0.000000 0.000000 0.009381 0.012102 0.000000 0.000000 0.001422 0.002508 0.007866 0.004914 0.003755 0.000000 0.000000 0.000000 0.001004 0.000000 0.008766 0.000000 0.000000 0.000000 0.000000 0.000000 0.004891 0.000000 0.000000 0.996401 0.000000 0.000000 0.000000 0.000000 0.000000 0.026062 0.000000
FLAG_DOCUMENT_18 0.006931 0.010781 0.040891 0.033436 0.042001 0.007344 0.002781 0.007093 0.003753 0.014584 0.013745 0.004838 0.005906 0.000000 0.010199 0.000000 0.007172 0.004932 0.000000 0.000000 0.008336 0.000000 0.024562 0.000000 0.004668 0.000000 0.999751 0.000000 0.008665 0.001482 0.003964 0.077061 0.006343
FLAG_DOCUMENT_17 0.002121 0.000000 0.006542 0.000000 0.013182 0.001771 0.000000 0.000000 0.002095 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002696 0.000000 0.000000 0.004431 0.002902 0.000000 0.003318 0.000000 0.000000 0.000000 0.000000 0.992645 0.000000 0.000000 0.000000 0.015840 0.000000
FLAG_DOCUMENT_16 0.005909 0.000000 0.043219 0.005636 0.024330 0.010573 0.003221 0.013565 0.009108 0.011526 0.002251 0.003102 0.000000 0.000000 0.011751 0.000000 0.034111 0.005564 0.000000 0.007011 0.011595 0.000000 0.026666 0.000000 0.005282 0.000000 0.008665 0.000000 0.999793 0.002040 0.004533 0.084537 0.000000
FLAG_DOCUMENT_15 0.000000 0.000000 0.014710 0.005112 0.015954 0.005808 0.000632 0.003596 0.007502 0.000000 0.000000 0.000000 0.000000 0.000000 0.003237 0.000000 0.000000 0.000000 0.000000 0.000000 0.021195 0.000000 0.009094 0.000000 0.000000 0.000000 0.001482 0.000000 0.002040 0.998297 0.000000 0.022722 0.000000
FLAG_DOCUMENT_14 0.000000 0.000000 0.022684 0.002172 0.061699 0.009598 0.012998 0.000000 0.010039 0.003763 0.005554 0.004788 0.011965 0.000000 0.005989 0.000000 0.000000 0.001828 0.000000 0.000000 0.029604 0.000000 0.013654 0.000000 0.001580 0.000000 0.003964 0.000000 0.004533 0.000000 0.999284 0.027626 0.000000
FLAG_DOCUMENT_20 0.000382 0.000000 0.009618 0.000000 0.000000 0.000000 0.000000 0.001860 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000356 0.000000 0.007105 0.000000 0.000000 0.000000 0.002940 0.000000 0.005685 0.000000 0.045628 0.026062 0.077061 0.015840 0.084537 0.022722 0.027626 0.995966 0.003904
FLAG_DOCUMENT_21 0.005234 0.000000 0.008300 0.000000 0.016460 0.003850 0.000000 0.001012 0.000000 0.000884 0.003860 0.004583 0.000000 0.000000 0.000000 0.000000 0.026226 0.000000 0.000000 0.000000 0.000000 0.000000 0.005088 0.000000 0.003804 0.000000 0.006343 0.000000 0.000000 0.000000 0.000000 0.003904 0.994251

He dividido el graficado de la V de Cramers en variables categóricas puras, que contienen texto en sus columnas y variables booleanas porque, aunque sean variables que tengan valores numéricos por naturaleza, los valores alojados solo pueden ser dos, 0 o 1. Por tanto, representan variables categóricas como tal, asociadas al valor 0 o al valor 1.

Es por ello que no se puede utilizar la correlación de Pearson para estudiar la correlación entre este tipo de variables con la variable TARGET.

In [30]:
plt.figure(figsize=(30,15))
sns.heatmap(corr_bool, annot=True, fmt='.3f', cmap='YlGnBu')
plt.title('Cramers V Matrix', fontdict={'size':'17'})
plt.show()
No description has been provided for this image

Si bien no se observan correlaciones muy altas de las diferentes variables categóricas y booleanas con nuestra variable target, la variable que tiene la correlación más alta es OCCUPATION_TYPE, que comentamos anteriormente en en análisis gráfico. Esta variable presenta una correlación del 8%, aunque no es mucho si que podría tener importancia en el modelo.

Destacar correlaciones entre el 30% y el 70% entre variables como pueden ser el tipo de vivienda y sus materiales de construcción, además de las características de las viviendas. Esta alta relación no es preocupante ya que se trata de una relación lógica.

También observar una correlación del 42.3% entre el nombre del puesto de trabajo que ocupa el cliente y el tipo de empresa en la que trabaja. A priori también una relación normal y no preocupante.

Weight of Evidence (WoE) e Information Value (IV)¶

El WoE es una medida que transforma una variable categórica o continua en una escala que refleja la relación entre las probabilidades de los dos grupos de la variable dependiente (por ejemplo, "fraude" y "no fraude"). Se calcula de la siguiente manera:

$$ WoE = ln (Distribución de la clase positiva/Distribución de la clase negativa) $$

Interpretación:

  • Si WoE > 0, la categoría tiene una mayor proporción de positivos (indicando un buen predictor para la clase positiva).
  • Si WoE < 0, la categoría tiene una mayor proporción de negativos (indicando un buen predictor para la clase negativa).
  • WoE = 0 indica que la categoría tiene una distribución balanceada entre positivos y negativos, lo que no aporta mucha información.

¶

El Information Value (IV) es una métrica que ayuda a cuantificar la capacidad predictiva de una variable con respecto a la variable objetivo (target). Es una medida acumulada de las diferencias entre las proporciones de positivos y negativos en cada grupo.

El IV se calcula sumando los valores de WoE ponderados por la diferencia entre las proporciones de positivos y negativos en cada grupo:

$$ IV = ∑(Proporción de la clase positiva − Proporción de la clase negativa) × WoE $$

Interpretación del IV:

  • IV < 0.02: Baja capacidad predictiva.
  • 0.02 < IV < 0.1: Capacidad predictiva débil.
  • 0.1 < IV < 0.3: Capacidad predictiva moderada.
  • 0.3 < IV < 0.5: Alta capacidad predictiva.
  • IV > 0.5: Muy alta capacidad predictiva (aunque se debe tener precaución de no sobreajustar el modelo).

A continuación vamos a calcular el WOE y el IV para algunas variables categóricas que me parecen interesantes. De las que posteriormente comentaremos las conclusiones.

In [33]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='OCCUPATION_TYPE', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'Accountants': np.float64(-0.5400057046334321), 'Cleaning staff': np.float64(0.20125383936768831), 'Cooking staff': np.float64(0.317780773417085), 'Core staff': np.float64(-0.2541654427300936), 'Drivers': np.float64(0.3876294694822735), 'HR staff': np.float64(-0.3778470566427261), 'High skill tech staff': np.float64(-0.29303383912994113), 'IT staff': np.float64(-0.11435561393858834), 'Laborers': np.float64(0.29302322513685075), 'Low-skill Laborers': np.float64(0.8619870550944092), 'Managers': np.float64(-0.3019227627931952), 'Medicine staff': np.float64(-0.2105095609540891), 'Private service staff': np.float64(-0.2755682075223078), 'Realty agents': np.float64(-0.039935644637323194), 'Sales staff': np.float64(0.20619282273186823), 'Secretaries': np.float64(-0.14368624748642267), 'Security staff': np.float64(0.33001694046914937), 'Waiters/barmen staff': np.float64(0.33623333358368496), 'Desconocido': np.float64(-0.24043809051046666)}
IV de la variable: 0.08587967416283065
In [34]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='NAME_INCOME_TYPE', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'Businessman': 0, 'Commercial associate': np.float64(-0.08014465465518467), 'Maternity leave': np.float64(2.4324819935799025), 'Pensioner': np.float64(-0.43494908113169145), 'State servant': np.float64(-0.35974790110068705), 'Student': 0, 'Unemployed': np.float64(2.027016885471738), 'Working': np.float64(0.1878468975898912), 'Desconocido': 0}
IV de la variable: 0.05808599223176106
In [35]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='NAME_EDUCATION_TYPE', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'Academic degree': np.float64(-2.4503199290064686), 'Higher education': np.float64(-0.4393091653969691), 'Incomplete higher': np.float64(0.05657720465626268), 'Lower secondary': np.float64(0.3385501370579372), 'Secondary / secondary special': np.float64(0.11163766773089984), 'Desconocido': 0}
IV de la variable: 0.05154040418506241
In [36]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='CODE_GENDER', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'F': np.float64(-0.1579356962666567), 'M': np.float64(0.2556181541980332), 'XNA': 0, 'Desconocido': 0}
IV de la variable: 0.040237003552605975

Voy a comentar mis conclusiones de las 4 variables analizadas:

  • En la variable 'OCCUPATION_TYPE' se observa como en trabajos menos cualificados el coeficiente WoE es positivo, es decir, cuanto mayor sea el coeficiente, mayor proporción de 1 en TARGET tendrán este tipo de trabajos. Por tanto, los clientes con trabajos poco cualificados como 'low-skill laborers', 'Drivers', 'Security Staff' o 'Waiters' muestran mayor proporción de 1 en TARGET (dificultad de pago). A su vez, clientes con trabajos más cualificados tienen coeficientes negativos, que supone que la categoría tiene una mayor proporción de clientes con TARGET = 0.

  • En la variable 'NAME_INCOME_TYPE' observamos como 'Unemployed' y 'Maternity leave' tienen un gran coeficiente positivo, por lo que son buenos predictores para TARGET = 1 (dificultad de pago). Por otro lado, 'Pensioner' y 'State servant' tienen coeficientes negativos, que supone que la categoría tiene una mayor proporción de clientes con TARGET = 0. 'Businessman' tiene un valor de 0, lo que significa que la categoría tiene una distribución balanceada entre positivos y negativos

  • En la variable 'EDUCATION_TYPE' los clientes con mejor educación tienen coeficientes negativos y los clientes de menor educación tienen coeficientes positivos. En principio, es algo lógico.

  • La variable 'CODE_GENDER' me parece interesante, pues los hombres 'M' tienen mayor coeficiente que las mujeres 'F', por tanto, a priori la mayoría de la proporción de TARGET = 1 (dificultad de pago) se corresponde a clientes varones.

¶

Como conclusión acerca del IV, observamos que todos los valores se encuentran en el intervalo 0.02 < IV < 0.1, por tanto, las variables presentan una capacidad predictiva débil. Esto ocurre ya que es necesario combinar varias variables para forjar una capacidad predictiva fuerte, si una única variable tuviera mucho poder predictivo sobre la variable objetivo podría generar problemas de multicolinealidad, overfitting o sesgo.

Exportación de datasets¶

In [31]:
print(df_loan_train.shape, df_loan_test.shape)
(246008, 122) (61503, 122)
In [32]:
df_loan_train.to_csv('../../data_loan_status/interim/data_split/df_loan_train.csv', index=False) 
df_loan_test.to_csv('../../data_loan_status/interim/data_split/df_loan_test.csv', index=False)

Conclusiones EDA¶

Como hipótesis inicial y respondiendo a la pregunta planteada para la práctica ¿Hay algún tipo de clientes más propenso a no devolver un préstamo? Según nuestro análisis exploratorio de los datos podríamos deducir que tipo de cliente sería más propenso a no devolver un préstamo. Destacar que este perfilado de clientes es una hipótesis propia realizada bajo mi criterio según los valores estadísticos visualizados en el EDA, que podremos contrastar cuando realicemos el Feature engineering y el modelado. En esa parte de la práctica volveremos a comentar si rechazamos o no rechazamos la hipótesis nula aqui planteada.

Según el análisis exploratorio de los datos realizados en los 2 primeros notebooks, podemos intuir que el tipo de cliente que tendrá dificultades a la hora de pagar o devolver el préstamo de manera completa será:

  • Un cliente con una baja educación
  • Que tenga un coche antiguo
  • Un trabajo cualificadamente bajo
  • Que tenga una vivienda construida con materiales pobres, especialmente madera.
  • Una familia grande con mas de 2 hijos
  • Que esté desempleado o de baja

Posteriormente en la realización del feature engineering y del modelado verificaremos si la hipótesis inicial planteada según mi criterio bajo la interpretación de los estadísticos realizados y visualizados se cumple.

En la realización de este análisis exploratorio de los datos hemos aprendido:

  1. Entendimiento profundo de nuestros datos y de la problemática de negocio.
  2. La importación de nuestros datos, dimensiones de los mismos, división y reconocimiento de las diferentes categorías aportando una visualización de las mismas.
  3. Detección, graficado y análisis de nuestra variable objetivo. Concluyendo que presentaba un claro desbalanceo.
  4. Separación de nuestro dataset en train y test de manera estratificada debido al desbalanceo de nuestra variable objetivo.
  5. Visualización descriptiva de nuestras variables, pudiendo comprender su naturaleza, distribución e importancia en la variable objetivo.
  6. Tratamiento de valores atípicos (outliers), comprendiendo la importancia de los mismos y la repercusión que pudieran tener en la fase de modelado.
  7. Tratamiento de valores nulos, en todas las categorías de los datos (numéricos, booleanos y categóricos), aprendiendo y reflexionando sobre las diferentes métricas de imputación de valores nulos. Observando como afectan a la distribución y a la descripción estadística de nuestras variables.
  8. Análisis de correlación de las variables, pudiendo comprender como afecta una alta correlación en nuestra variable objetivo.

Todo esto nos permitió comprender que trabajamos con un Dataset que contiene muchas variables de diferentes tipos, con las cuales buscamos explicar y predecir el comportamiento de nuestra variable objetivo, es decir, cuando un cliente puede llegar a tener dificultades en el pago de un préstamo.

Con estas conclusiones, tenemos un problema complejo por delante que supondrá un gran reto desde el punto de vista del éxito de nuestros modelos, debido a que el modelo más simple de todos sería decir que pocos clientes tendrían dificultades en el pago del préstamo, ya que sólo tendríamos error en el 8.07% de las veces. El objetivo será intentar mejorar ese porcentaje agregando complejidad a nuestro análisis.

Cosas a tener en cuenta a la hora de ejecutar modelos:¶

  • Podría ser necesario balancear el modelo, con técnicas de oversampling
  • Hay variables que identificamos como importantes para predecir la dificultad de pago, como OCCUPATION_TYPE (puesto de trabajo), NAME_EDUCATION_TYPE (tipo de educación), NAME_INCOME_TYPE (pensionista, estudiante, trabajador), CNT_CHILDREN (tamaño de la familia), entre otras.
  • Posibilidad de realizar un Mean Encoding en vez de One-Hot encoding para variables categóricas que presenten muchas categorías.